Open d33bs opened 5 months ago
Hi @d33bs thank you for pointing this out. The use of SMPDB_pubmed_IDs.csv
is indeed not ideal from the standpoints of reproducibility and transparency. I can get you a copy of that file if you like (reach out to me by email and I will set it up). Your suggestion of better documenting what this file is, and how it can be obtained, is a good one; we will add that info to the RTX-KG2 documentation.
Thank you @saramsey !
This issue highlights a need to provide description and provenance for a file used to run the full RTX-KG2 pipeline,
SMPDB_pubmed_IDs.csv
. This file appears to be required for a full workflow run of RTX-KG2. I believe this is referenced in the RTX-KG2 article under Table 6, row 4, and the Acknowledgements section as "We thank David Wishart and Carin Li for providing a download link for the SMPDB PubMed annotations ...". This file could benefit from being added to the list of data sources including how it was generated (any additional data sources or code) and how it may be requested or permitted for use (for example, if any specific licensing applies). Apologies in advance if I misunderstand the nature of this data as it pertains to RTX-KG2.