PNNL-CompBio / coderdata

Automation scripts and benchmark dataset package for cancer drug prediction deep learning models.
Other
11 stars 3 forks source link

ids stored as floats in beataml_transcriptomics #255

Closed ymahlich closed 4 days ago

ymahlich commented 1 week ago

improve_sample_id & entrez_id are stored as float in beataml_transcriptomics.csv.gz (see example below). This is inconsistent with the entrez_id in the genes table as well as improve_sample_id in the experiments table for example (both are integers).

This behavior is present for the dataset v0.1.4 on figshare (https://figshare.com/articles/dataset/CODERData0_1_4/26409316) / v0.1.40 on pypi (https://pypi.org/project/coderdata/)

I did not check other files besides beataml_transcriptomics, beataml_experiments & genes.

Sample lines from beataml_transcriptomics.csv.gz:

improve_sample_id,transcriptomics,entrez_id,source,study
3334.0,0.4719481112214393,7105.0,synapse,BeatAML
3334.0,0.0,64102.0,synapse,BeatAML
jjacobson95 commented 1 week ago

Full list of files to be updated from the latest versions. MPNST will have to be double checked as this is getting updates.