CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
120 stars 28 forks source link

Question: CDE applied to drug repurposing #11

Closed tcaceresm closed 2 years ago

tcaceresm commented 2 years ago

Hi, I am very very new to bioinformatics and text mining. Please don't be rude. I am trying to extract the association of a drug and its target. For example: "The best results were observed for 1-methoxy-1-oxoalkan-2-yl salicylates which showed moderate or good activity against Botrytis cinerea and Rhizoctonia solani"

The expected output should be: Compound: 1-methoxy-1-oxoalkan-2-yl salycylates, Target: [Botrytis cinerea, Rhizoctonia solani], Bioactivity: [moderate, good]

Is this achievable with this toolkit?

So far I have been able to adapt the ChemDataExtractor toolkit to extract the chemical entities of an article and an associated value of interest, such as the IC50 or Ki, but as I said above, I have not been able to extract the target of the molecule.

Thanks in advance

ti250 commented 2 years ago

ChemDataExtractor is good for extracting compounds and numerical values, but extracting targets and bioactivity may be difficult. You may be able to get around this by writing some manual rules, but I think it will be quite difficult to label the right targets, and associate them in the correct way with the compound...

Also, the target is probably out of domain for our Named Entity Recognition system which would make it relatively unlikely that it would be picked up (and I assume manual rules wouldn't be ideal considering you may want to pick up targets you haven't thought about before)... Perhaps you could try training your own NER system for those parts, but I think it may be a big undertaking unless there already exist systems for this.

cjcourt commented 2 years ago

I agree with @ti250 on this one. You might be better-off looking at pre-trained biomedical sequence classifiers, such as bioBert (https://github.com/dmis-lab/biobert-pytorch/tree/master/relation-extraction).

tcaceresm commented 2 years ago

Thank you guys very much, I really appreciate your good answers!