facebookresearch / Clinical-Trial-Parser

Library for converting clinical trial eligibility criteria to a machine-readable format.
Apache License 2.0
163 stars 58 forks source link

ie_parse removes some trials #12

Closed bitmman closed 3 years ago

bitmman commented 4 years ago

Hello,

I was using ie_parse to parse the eligibility criteria. After I input the data with 2034 trials, I got the output data with only 1394 trials. I used the same format as the example clinical_trials.csv file. So I'm wondering how these 640 trials were removed. Thanks in advance.

salkola commented 4 years ago

It is possible that for some trials no medical entities were extracted. If you list a few missing NCT IDs, I will explore in more detail.

bitmman commented 4 years ago

Hello salkola,

Thanks for your reply. Here I listed 10 missing NCT IDs. 'NCT02735707', 'NCT04254874', 'NCT04255017', 'NCT04255940', 'NCT04256395', 'NCT04259892', 'NCT04262921', 'NCT04270383', 'NCT04274322', 'NCT04275245'. I have also put these 640 missing trials as input and got zero output.

salkola commented 4 years ago

For 3 trials, the IE parser extracted no relations. For example, NCT04255940 has only one eligibility criterion that is not recognized. Your trials happen to contain a small number of criteria, which the system is not able to parse.

To increase the yield, new items could be added to the custom concepts and synonyms or models re-trained with new data.

The attachment contains my results for your 10 example trials.

bitmman commented 4 years ago

Thanks for your explanation. I have a few questions about the file clinical_trials.csv. How did you generate that file? In your clinical_trials.csv file, why does the column 'eligibilit_criteria' is empty? Doesn't it affect the final results? I used the same format as yours but the output is still zero. Here I attached the file clinical_trials.zip. These 640 trials include the 10 example trials you have parsed, but my output is no results. Thank you.

salkola commented 4 years ago

1) I modified ingest.sh to read trials t1.nct_id = ANY(ARRAY['NCT02735707','NCT04254874','NCT04255017','NCT04255940','NCT04256395','NCT04259892','NCT04262921','NCT04270383','NCT04274322','NCT04275245']) to clinical_trials.csv. 2) The column eligibility_criteria in my clinical_trials.csv is not empty. It may look empty if you view the file using Excel or a similar tool. Note that the eligibility_criteria string begins with a new line. 3) Your file is genuinely missing the eligibility_criteria field.

salkola commented 3 years ago

Resolved. Please reopen if you have additional questions.