Closed zfx0726 closed 3 years ago
Hello! I'm very happy to hear that the parser is serving your use case! In the ancillary .tsv data file you linked (arxiv submission) we include the raw input data by line under the column original_text
. To get the full raw input data, the NCT_ID for each trial which is also linked within the .tsv file can be searched up in clinicaltrials.gov (e.g., NCT00097734, NCT00174525, NCT00594516). Finally you can compare our extractions with Criteria2Query's evaluation directly with the data hosted upon their Github.
I hope that answers your question!
Thanks for the quick response! Just edited my original comment to clarify - the raw data I was looking for was specifically the raw golden data set created by Yuan et al. Sounds like that's hosted on Criteria2Query's Github, so I'll take a look through that!
Thanks again for all your work on this parser, it's been extremely helpful for making use of clinicaltrials.gov data. In the Results section of your paper, you share the parser's performance on the 10 trial golden set created by Yuan et al for Criteria2Query. Is that raw golden data available somewhere in this repo or could you point to where that lives? I see some processed ancillary data here but was looking for the raw data if that's available. Thanks!