Open adarshp opened 6 years ago
Thank you!
@hickst: when you're back, can you please take a look?
No problem. It seems that for the first two documents in the list, it seems that the malformed entries come from a section called 'List of abbreviations used':
Also I realized I didn't put the header row in the MalformedEntries.txt file earlier, here is an updated version:
Here is the tsv output from the REACH web service (http://agathon.sista.arizona.edu:8080/odinweb/uploader) processing the paper with PMID 1198222.
While looking through some REACH output exported in the CMU format, I came across entries that have the fields mixed up. I’ve attached a (non-exhaustive) list of them with this post. They do not comprise a large fraction of the data, but I thought it might be good to bring it to your attention nonetheless. The file ‘MalformedEntries.txt’ has the entries that have wrong values for the fields “Database Name”, “PosReg Type”, and “NegReg Type”. I am not sure whether the bug is in the reader or exporter part of the codebase, but I'll try to take a crack at fixing it.
MalformedEntries.txt