facebookresearch / Clinical-Trial-Parser

Library for converting clinical trial eligibility criteria to a machine-readable format.
Apache License 2.0
163 stars 58 forks source link

Not getting output for other trials #2

Closed sandeepsingh closed 4 years ago

sandeepsingh commented 4 years ago

When i run the cfg_parse.sh and ie_parse.sh on some other trails which i had taken from ct.gov dump. I am not getting any output,

It says this for cfg_parse.sh : Studies: 100000, Criteria: 0, Parsed criteria: 0, Relations: 0, Relations per criteria: NaN%

and this for ie_parse.sh: I0511 06:45:23.350843 25130 mesh.go:85] [data/mesh/custom_mesh_concepts_p1.tsv data/mesh/custom_mesh_concepts_p2.tsv]: Nodes read: 116, New nodes: 72 Indexing ... indexed Descriptors: 26577 Concepts: 52999
Terms: 216493
I0511 06:45:26.860631 25130 main.go:240] Matching NER terms ... I0511 06:45:26.860760 25130 main.go:298] Lines read: 1, Slots: 0, Unique slots: 0 I0511 06:45:26.860771 25130 main.go:299] 0 slots matched to 0 concepts I0511 06:45:26.860787 25130 main.go:300] 0 slots not matched I0511 06:45:26.860811 25130 main.go:307] 00:00:18.057

Can you help with this what am i doing wrong on newer trails? Data format i have kept as given in the sample input file.

salkola commented 4 years ago

Did you use aact.sh (creates a postgres db of all trials) and something like ingest.sh (reads trials from the postgres db to a file) to create your input file? cfg_parse.sh and ie_parse.sh expect an input file that looks like clinical_trials.csv. If you attach the beginning of your input file (say the first 10 trials) to this issue, I will take a look.

ingest.sh is provided as a reference for creating input files. It should be a simple task to modify the script according to your interests and needs, for example to select clinical trials by their study conditions or overall status.

sandeepsingh commented 4 years ago

Thanks @salkola. There was a connection issue while downloading the dump from aact.sh, i had previously downloaded the ct.gov dump so i used that to create my input file. I have attached test data which i am giving as input.

nct_id | eligibility_criteria | title | has_us_facility | conditions

NCT1234 | Inclusion Criteria: - Patients must have a body surface area (BSA) >= 0.53 m^2 | None | FALSE | lung cancer NCT1234 | Inclusion Criteria: - Histologically confirmed unresectable or metastatic translocation morphology renal cell carcinoma diagnosed using World Health Organization (WHO)-defined criteria. Patients may be newly diagnosed or have received prior cancer therapy | None | FALSE | lung cancer NCT1234 | Inclusion Criteria: - Patients must have had histologic verification of the malignancy | None | FALSE | lung cancer NCT1234 | Inclusion Criteria: - Patients must have measurable disease, documented by clinical, radiographic, or histologic criteria as defined by Response Evaluation Criteria in Solid Tumors (RECIST) version (v)1.1 | None | FALSE | lung cancer

salkola commented 4 years ago

The issue is that your input is a pipe separated file when it should be a csv file and the columns should be ordered as #nct_id,title,has_us_facility,conditions,eligibility_criteria. Moving the eligibility_criteria column to the end should fix the ordering issue. We could add a feature where columns are identified by their name instead of the position.

sandeepsingh commented 4 years ago

Yes i fixed it after looking at ingest.sh. Thanks, Its working now.

salkola commented 4 years ago

Great!