Program bug

dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920

Apache License 2.0

224 stars 50 forks source link

Program bug #15

Closed LMY-nlp0701 closed 5 years ago

LMY-nlp0701 commented 5 years ago

Hello! Sorry to disturb you. I encountered some programs bug and I hope you can help me solve this problem.

I think I have already done the preliminary work. I made tenth steps：10.Generate all entity disambiguation datasets in a CSV format needed in our training stage: As shown in the picture: step_10

I ran the code and got the following results, but I met some bug. As shown in the picture:

Thank you!

octavian-ganea commented 5 years ago

This likely means that the file opened in line 182 does not exist. Can you verify that the file at path mentioned in L182 does really exist ?

LMY-nlp0701 commented 5 years ago

Hi, I downloaded, unpacked and uploaded basic_data again. After my investigation step by step, I find the problem is gen_test_ace('wikipedia')（data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua line:202）

When I annotate this line(202 line)

Run step 10, successful operation is as follows:

I think the program runs successfully, because the generated files and sizes are consistent with those described later.

So the question arises. First of all, I haven't modified any code and documents. Why can the other four corpuses run successfully while Wikipedia can't?

Thank you for your answer.

LMY-nlp0701 commented 5 years ago

Hi ! I tried to explore why Wikipedia failed to run.

I made the following modifications to the code. （data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua begin with line:182）

Accordingly, I have annotated four other corpora.

I just run gen_ace_msnbc_aquaint_csv.lua.

Command: th data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua -root_data_dir data_path/ The experimental results are shown below. 9ef8yjhwh p cg8vf3mizgp uif 7vw mhy 4p29jf3r u10jdv bi hlavu3 dmfh3 olr4 _s_osmk xi4 jd31z3 p6vh ax1qpaq j1_054y q

Results show ZielonaGóra(parliamentary_constituency) does not exist, but I am sure my corpus contains this document.

Thank you for your answer!

LMY-nlp0701 commented 5 years ago

Hi ! I think I have solved this problem. After my own step-by-step debugging, I found that there was no mistake in the program itself.

The problem is the way to decompress basic_data.zip files.

My previous operation

Decompression in Windows environment.(Local)
Upload to server (Linux environment)

The problem posed： Files with special characters (such as Rome characters) in the folder show exceptions. The folder is basic_data/test_datasets/wned-datasets/wikipedia/RawText

So when you directly decompress on the server(Linux environment), you will not encounter such a bug. I hope other people will take this as a warning.

Finally, I would like to thank the author again for sharing this code, and also for the author's answer. Thank you so much!

octavian-ganea commented 5 years ago

Ah, nice catch! Thanks!