CogComp / zoe

Zero-Shot Open Entity Typing as Type-Compatible Grounding, EMNLP'18.
43 stars 5 forks source link

Could you please provide full test dataset used in the paper #27

Closed hitercs closed 5 years ago

hitercs commented 5 years ago

Hi,

Thanks for your work. I see you only release the test datasets (i.e. FIGER, BBN, OntoNotes_fine) for fine-grained entity-typing. Are these datasets a full version or just a sample of the full datasets?

On the other hand, could you please provide the test datasets used in coarse entity-typing (i.e. Table 3 in the paper) and Biology Entity Typing (Table 5 in the paper). Thanks a lot.

Slash0BZ commented 5 years ago

Hi, FIGER and BBN experiments were provided with full data. As for Ontonotes_fine, I didn't process all the data into Python formats (as the original experiments were done in Java, this is a re-written Python package for demo purpose. I could add them as a future enhancement this summer).

What I can do now is to provide the full Ontonotes_fine and biology test sets, as well as the mappings for CoNLL/Ontonotes/MUC experiments. You can acquire these NER datasets online (as I am not sure about the licenses). For Biology mapping, just check whether the FreeBase type contains "microorganism".

Here are the files: https://drive.google.com/drive/u/1/folders/1atg8MTNvu87Pw2TAUQxkLzNk7kk5lbJN

You need to write your own reader for those datasets (as I only have them in Java). After that, you should be able to run them with this package, given the mappings.

Let me know if you have any questions here, or send me an email.

hitercs commented 5 years ago

Hi,

Thanks. That's great. I well receive 1) the mappings for CoNLL/Ontonotes/MUC experiments 2) full Ontonotes_fine test sets. But the BioNLPST16.zip data seems to be broken? Can you open it in correct format? What tools do you use to open if you are ok with them? Finally, when you say you didn't process all the data into Python formats for Ontonotes_fine, the data mention here refers to wikilinks.min.embedding.pickle and target.min.embedding.pickle? But the Ontonotes_fine test data is the full version? Am I right? Thanks.

Slash0BZ commented 5 years ago

Updated the bioNLP file. Could you check again?

Regarding no processed data - your understanding is correct. The test data was provided in full in the google drive link above. In the original repo, if you use the script to download the data, you only get 30% (forgot the exact number) of the data, with all the wikilinks and Elmo cached.

hitercs commented 5 years ago

The bioNLP file is fine now. That's great. Thank you very much!

Slash0BZ commented 5 years ago

Thanks. Closing for now.