Open arxrean opened 4 years ago
Hi @arxrean,
I also would like to use this study for my research. I figured out how we could generate training data. Under the data folder, there is a python file called dataprocessing.py. You need complete files from the Knowledge Base, which are entities, relations, and triples (output.csv). Then you can generate train and test files.
However, I couldn't proceed with the second step, training. That one also needs entities, relations files in this time .txt format. I wonder whether they are the same but txt version of entities and triple under the data.
In the paper, the graph is trained with R-GCN. I couldn't figure out where this process is located.
Dear @cuilimeng, you may clarify our confusion and questions. Kind regards.
Hi @isspek , Thank you for your help! Yes, there is no complete knowledge base data so I cannot run the code at that time.
I think they are the same because there is only one medical knowledge graph used.
I think the R-GCN part is located at TextRelationalGraphAttention.py
(and also DETERRENT.py
).
I suggest you can check another github project (https://github.com/esddse/GUpdater). It has very similar basic code (text+GRU+RGCN) and can run smoothly. Then you can come back to check this one if you want to know some specific parts (e.g., positive/negative relations).
Best,
Hi @arxrean and @isspek,
The files entity2id.csv, output.csv and token2id.csv are from the medical knowledge graph KnowLife. The term of use of this dataset prohibit us to distribute it to other parties, even if the purpose is noncommercial or educational usage by these other parties. So that's why the dataset is incomplete. Sorry for any inconvenience.
Limeng
@cuilimeng No problem! Thank you again for sharing the code.
@arxrean, thanks for your suggestion, I will check that code too. @cuilimeng thanks for the clarification and for the nice work!!
Does anyone know how to get access to KnowLife? The link has no useful or contact info.
Thank you for sharing the code. But seems the training json file only contains 1 json data. Is there any guide for generating more for training?
Besides, is it possible to upload a pretrained model? So we can just run through the testing process to get the reasonable results.
Thank you!