Closed dhimmel closed 1 week ago
Just wanted to note that Figure 1 is a nice graphical representation of these relationships
The dataset would offer more fields and access to all 1900 relations.
Thank you for your valuable input!
The requested table is now available. You can download it from the following link:
🔗 Consolidated Relations Dataset
The table includes the following columns:
Feel free to reach out if you have any further suggestions or questions.
Thanks @EsmaeilNourani. Very helpful.
I see the data was added in https://github.com/EsmaeilNourani/LSF_Disease_RE/commit/9895a5e0ffc6d2eddc349204750fc702a95790da. Would it make sense to also commit the code that generates this data?
The contents of Consolidated_Relations_Dataset.tsv
look good to me. Nicely executed.
It would be ideal to archive this table in the Zenodo deposit so it lives with the rest of the data release.
Once code is available to generate this dataset and the data is on Zenodo, I will close this issue to denote that the request has been resolved. Same applies to https://github.com/EsmaeilNourani/lifestylefactors-annotation-docs/issues/2.
FYI the full text of my review is online here. Thanks for posting a preprint such that it is possible for me to immediately share my review. Cheers.
Consolidated_Relations_Dataset.tsv
is on the Zenodo and the contents look good. The code to generate is at helpers/create_consolidated_relations.py
. Thanks @EsmaeilNourani
Greetings, I'm reviewing the preprint LSD600: the first corpus of biomedical abstracts annotated with lifestyle–disease relations. I am excited to look at the 1900 disease to lifestyle factor relationships as per:
From the Zenodo, I downloaded
LSD600.tar.gz
and began looking at the.txt
files with abstracts and.ann
files that appear to be BRAT standoff format containing annotated named entities and relations.Many of the early
.ann
files only contained entities and no relationships, but I stumbled upon34621627.ann
containing both. Snippet below:This preamble is for orientation and to demonstrate that's its not easy for the reader to manually inspect all 1900 relations.
My suggestion is to create a consolidated dataset with one record per manually curated relation across all 600 abstracts. This would not replace the txt/ann files but be a useful access point for users to immediately inspect the data. Suggested fields are:
JSON, TSV, or excel would all make a reasonable format choice for this. Column names are greatly appreciated!
This would help me get a much better appreciation for the types and quality of relationships annotated by LSD600.