PNNL-CompBio / coderdata

Automation scripts and benchmark dataset package for cancer drug prediction deep learning models.
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

MPNST datasets' "improve_sample_id" not aligning with the coderdata version 0.1.26 #163

Open moonchangin opened 2 weeks ago

moonchangin commented 2 weeks ago

Issue: Misaligned improve_sample_id in MPNST Databases

Files:

Description:

There appears to be a misalignment between the improve_sample_id fields in two MPNST database files. This discrepancy wasn't present in the initial MPNST data version I created, but has been noticeable recently (data is from version 0.1.26).

Details:

Expected Behavior:

Both files should have aligned improve_sample_id values for consistent data analysis and integrity.

Actual Behavior:

The improve_sample_id sets from the transcriptomic and experiment files do not match, indicating potential issues in data alignment or entry.

Steps to Reproduce:

  1. Load the data from both files.
  2. Extract and compare the unique improve_sample_id sets as shown above.
  3. Observe the discrepancy between the two sets.

Additional Context:

This alignment issue may affect data analysis and integrity, requiring a thorough investigation and correction of the sample IDs in the relevant databases.

sgosline commented 2 weeks ago

Jeremy can you please confirm that no other sample files have issues, then rebuild the MPNST data per the build_locally script? You shoudn't have to rebuilt everything.

jjacobson95 commented 2 weeks ago

Looking like there may be some other issues with hcmi and broad_sanger samples files. I'll create separate issues with more information