Evaluation dataset - Githubissues

facebookresearch / Clinical-Trial-Parser

Library for converting clinical trial eligibility criteria to a machine-readable format.

Apache License 2.0

166 stars 61 forks source link

Evaluation dataset #15

Closed zfx0726 closed 3 years ago

zfx0726 commented 3 years ago

Thanks again for all your work on this parser, it's been extremely helpful for making use of clinicaltrials.gov data. In the Results section of your paper, you share the parser's performance on the 10 trial golden set created by Yuan et al for Criteria2Query. Is that raw golden data available somewhere in this repo or could you point to where that lives? I see some processed ancillary data here but was looking for the raw data if that's available. Thanks!

YitongTseo commented 3 years ago

Hello! I'm very happy to hear that the parser is serving your use case! In the ancillary .tsv data file you linked (arxiv submission) we include the raw input data by line under the column original_text. To get the full raw input data, the NCT_ID for each trial which is also linked within the .tsv file can be searched up in clinicaltrials.gov (e.g., NCT00097734, NCT00174525, NCT00594516). Finally you can compare our extractions with Criteria2Query's evaluation directly with the data hosted upon their Github.

I hope that answers your question!

zfx0726 commented 3 years ago

Thanks for the quick response! Just edited my original comment to clarify - the raw data I was looking for was specifically the raw golden data set created by Yuan et al. Sounds like that's hosted on Criteria2Query's Github, so I'll take a look through that!