Sean-Blank / AMRcoref

9 stars 1 forks source link

Unclear data format #1

Open IreneZihuiLi opened 3 years ago

IreneZihuiLi commented 3 years ago

When running python train.py, it says that: FileNotFoundError: [Errno 2] No such file or directory: './data/corpora_base/evl'

Dataloader needs to load data in JSON format, but not very clear on how to generate/preprocess from the raw LDC2020T02 data. What the JSON data should look like? And what should be done to get the code running given the LDC2020T02 data txt format?

Sean-Blank commented 3 years ago

We are working on preprocessing codes. evl data has been uploaded.

IreneZihuiLi commented 3 years ago

Thanks for the update! Here are some bugs and errors I got:

I found that penman==0.6.2 works for the code, higher versions would cause a problem.

Also, when running prepare_data.py, I got the following error:

9795
9796
9797
9798
Traceback (most recent call last):
  File "prepare_data.py", line 40, in <module>
    for i, amr in enumerate(AMRIO.read(raw_path)):
  File "<MY_PATH>AMR/AMRcoref/amr_parsing/io.py", line 20, in read
    amr.graph = AMRGraph.decode(' '.join(graph_lines))
  File "<MY_PATH>AMR/AMRcoref/amr_parsing/amr.py", line 637, in decode
    return cls(_graph)
  File "<MY_PATH>AMR/AMRcoref/amr_parsing/amr.py", line 233, in __init__
    self._build_extras()
  File "<MY_PATH>AMR/AMRcoref/amr_parsing/amr.py", line 257, in _build_extras
    target = self.variable_to_node[edge.target]
KeyError: 31
TrinaDutta95 commented 2 years ago

I ran into this issue after running prepare_data.py (using the penman==0.6.2) True Traceback (most recent call last): File "prepare_data.py", line 34, in ms_amr_id_info, ms_amr_ids = get_all_multi_ids(xml_path) File "prepare_data.py", line 13, in get_all_multi_ids files = os.listdir(path) FileNotFoundError: [Errno 2] No such file or directory: '../data/xml-unsplit/'

I have the folder data/xml-unsplit in my main directory