Open iuria21 opened 5 years ago
+1
I would also be interested in this.
@basque21, did you have any success with training a model on your own data?
Hi, no, I'm sorry but I didn't get any result nor answer here, so I tried with other models...
Hi all,
If you use the same format as the aida file in our repo, that should work. Did you try that?
@NikosKolitsas can you please help these people ? Thanks!
Hello, sorry for not answering earlier but I have been working on other things the last years. In this work the Entity Recognition and Disambiguation is done simultaneously and the Entity Vectors play a crucial work in the process. I.e. if you want to run this system in your own domain (which I guess has completely different entities from the ones that exist in Wikipedia) then you should create your own entity vectors for sure. Instructions on how to do that can be found here. Furthermore, another important part of the system is the probabilistic mention - entity map p(e|m) which I guess is also something you have to modify for your domain. Regarding the format of the input files this is the last and easiest thing that you don't have to worry about. In the folder preprocessing you can find code that handles a few different formats (Aida dataset, format, some other xml based format and gerbil) and converts all of them to a common simplified format. The new simplified format can be found in the folder ./data/new_datasets/ So I would recommend to create/convert your dataset to this format directly. In general, the code has implementation details that are targeting the purpose of the paper i.e. NER and ED for the available datasets, with wikipedia concepts, also evaluation with Gerbil, and optimizations in training with tfrecords and was not designed with a plug and play mentality. Another thing you should take care of is the mapping from wiki-ids to neural network ids (wikid2nnid i.e. mapping from concept ids to entity vectors in your entity-embeddings array).
Hi, first thanks for your work. I have a short question and maybe you could help me:
I'm creating a new dataset, I have data with labeled NER and links to each Entity. I could create a dataset like (instead of Wikipedia link I have a link to a law-code):
Can I train a model with this data or do I need something else? There are some columns in
aida_train.txt
that I don't know what are them. And do you think the entity embedding will be useful in this case also?Thanks!!