dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920
Apache License 2.0
224 stars 50 forks source link

Program bug #15

Closed LMY-nlp0701 closed 5 years ago

LMY-nlp0701 commented 5 years ago

Hello! Sorry to disturb you. I encountered some programs bug and I hope you can help me solve this problem.

Program bug

I think I have already done the preliminary work. I made tenth steps:10.Generate all entity disambiguation datasets in a CSV format needed in our training stage: As shown in the picture: step_10

I ran the code and got the following results, but I met some bug. As shown in the picture:

1 2 3

Thank you!

octavian-ganea commented 5 years ago

This likely means that the file opened in line 182 does not exist. Can you verify that the file at path mentioned in L182 does really exist ?

LMY-nlp0701 commented 5 years ago

Hi, I downloaded, unpacked and uploaded basic_data again. After my investigation step by step, I find the problem is gen_test_ace('wikipedia')(data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua line:202)

When I annotate this line(202 line) 0

Run step 10, successful operation is as follows:

1 2 3 4

I think the program runs successfully, because the generated files and sizes are consistent with those described later. 5

So the question arises. First of all, I haven't modified any code and documents. Why can the other four corpuses run successfully while Wikipedia can't?

Thank you for your answer.

LMY-nlp0701 commented 5 years ago

Hi ! I tried to explore why Wikipedia failed to run.

I made the following modifications to the code. (data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua begin with line:182image

Accordingly, I have annotated four other corpora. image

I just run gen_ace_msnbc_aquaint_csv.lua.

Command: th data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua -root_data_dir data_path/ The experimental results are shown below. 9ef8yjhwh p cg8vf3mizgp uif 7vw mhy 4p29jf3r u10jdv bi hlavu3 dmfh3 olr4 _s_osmk xi4 jd31z3 p6vh ax1qpaq j1_054y q

Results show ZielonaGóra(parliamentary_constituency) does not exist, but I am sure my corpus contains this document. image

Thank you for your answer!

LMY-nlp0701 commented 5 years ago

Hi ! I think I have solved this problem. After my own step-by-step debugging, I found that there was no mistake in the program itself.

The problem is the way to decompress basic_data.zip files.

My previous operation

  1. Decompression in Windows environment.(Local)
  2. Upload to server (Linux environment)

The problem posed: Files with special characters (such as Rome characters) in the folder show exceptions. The folder is basic_data/test_datasets/wned-datasets/wikipedia/RawText

So when you directly decompress on the server(Linux environment), you will not encounter such a bug. I hope other people will take this as a warning.

Finally, I would like to thank the author again for sharing this code, and also for the author's answer. Thank you so much!

octavian-ganea commented 5 years ago

Ah, nice catch! Thanks!