dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920
Apache License 2.0
223 stars 50 forks source link

Missing file in Wikipedia raw text #26

Closed mickvanhulst closed 5 years ago

mickvanhulst commented 5 years ago

Dear author,

As I'm running the file to generate all datasets, I run into an error when running gen_test_train/gen_all.lua. Here the part regarding Wikipedia goes wrong as it is missing a file. I am missing the following file (which is not in the provided dataset): ./data/basic_data/test_datasets/wned-datasets/wikipedia/RawText/Chippenham_United_F.C.

I tried taking the approach that the issue listed here had taken, but in that case it was missing a different file: ./data/basic_data/test_datasets/wned-datasets/wikipedia/RawText/Electoral_division_of_Apsley'

Did you skip over documents that did not exist or could you please tell me how you handled this?

Thanks in advance!

octavian-ganea commented 5 years ago

Hmm, I am not sure why this happens. As far as i remember (might be wrong though), I skipped only over docs that have no annotations.

mickvanhulst commented 5 years ago

I found a similar issue . I had to perform two operations: 1) unzip it using my Ubuntu instance (I use Windows by default). 2) Windows recognizes dots at the end of filenames as file extensions, so I changed the files by removing the dot at the end (do not forget to change the XML file as well). I think this is purely a Windows problem, thanks for your quick response!