DBR integration - Githubissues

Thank you for your attention to our work! I apologize for the late reply. I have been quite busy and forgot to check GitHub.

Regarding how the entities used by DBR are collected and constructed, we will provide the code for this as soon as possible.

Here is a brief description of the process:

Step 1: Prepare the knowledge base, we use UMLS and the entities in the training split of the training data.

Step 2: For each training sample that undergoes instruction fine-tuning, we first use the phrase splitting tool AutoPhrase(https://github.com/shangjingbo1226/AutoPhrase) to slice and obtain a series of phrases, and we will also use a common words vocabulary to filter some phrases.

Step 3: Use these phrases as queries to search for related entities in the knowledge base from Step 1, using semantic similarity (https://github.com/cambridgeltl/sapbert) for the search. The search will yield a series of entities that are the same as the categories in the dataset, which will serve as positive samples, and negative samples will also be selected at a certain ratio.

Eulring / VANER

DBR integration #1