dmis-lab / biobert-pytorch

PyTorch Implementation of BioBERT
http://doi.org/10.1093/bioinformatics/btz682
Other
299 stars 104 forks source link

How can I use BioBERT for Relation extraction and further finding the effective entities in a sentence? #17

Open Meghna-Goyal opened 3 years ago

Meghna-Goyal commented 3 years ago

Hi team,

Please find below an example of the problem that I am trying to solve

Original Sentence: PD98059, a specific inhibitor of MEK, had little effect on the TNF-alpha-induced phosphorylation of Akt

Input sentence (Masking the entities): DRUG, a specific inhibitor of GENE, had little effect on the GENE-induced phosphorylation of GENE.

Expected output:

Relation: Inhibitor Effective Entities: DRUG and GENE (first occurrence)

Can you please let me know if I can use BIOBERT to do multilabel classification for relation extraction and then finding the effective entities in the sentence (if the sentence has multiple occurrences of the same entity type)

Thanks And Regards, Meghna Goyal

SRL94 commented 2 years ago

Hi Meghna,

I have the same question. Have you figured it out?

Best regards Sirui

wonjininfo commented 2 years ago

Hi all, And apologies for the delay in response, Meghna Goyal.

Last year, I worked on the multi-label RE task (DrugProt) using LMs and made our code available on https://github.com/dmis-lab/BioRE-drugprot-kuaz

You will need to write some code to pre-process your input data as preprocessing codes are not available yet. (I wish I can do it soon but has a list of things to do for my graduation these days - Apologies for this)

To predict relation classes for a plain text, you need to

  1. Run NER tools to recognize named entities
  2. pre-process your input data (post-NER) to match the format (check pre-processed datasets in the BioRE repo)
  3. Use a trained model to predict relation classes for your input data

Also please note that in our participation in the BioCreative VII challenge (DrugProt), we wrapped entities with markers, and this showed better performance than masking entities (i.e. replacing entities with masks). A short description of our participation is available here. Figure 3 may be informative for your question. When there are multiple entities in a sentence, we made multiple datapoints, or samples, from the sentence.

For example,

DRUG, a specific inhibitor of GENE, had little effect on the TNF-induced phosphorylation of Akt.
DRUG, a specific inhibitor of MEK, had little effect on the GENE-induced phosphorylation of Akt.
DRUG, a specific inhibitor of MEK, had little effect on the TNF-induced phosphorylation of GENE.

Thank you for your interest in our work! Best, Wonjin

anuragpande1977 commented 6 months ago

Hi I am lookin for help in using the RE for biomedical text, I have used Scispacy for NER using the bc5cdr for CHEMICAL and DISEASE entity on pubmed abstract. The NER functions quite well ,but RE is not something that comes in domain for Scispacy. I created semantic based graphs but need RE for the actual work. Any suggestions?