MPNST datasets' "improve_sample_id" not aligning with the coderdata version 0.1.26

moonchangin commented 2 weeks ago

Issue: Misaligned `improve_sample_id` in MPNST Databases

Files:

mpnst_experiments.tsv
mpnst_transcriptomics.csv.gz

Description:

There appears to be a misalignment between the improve_sample_id fields in two MPNST database files. This discrepancy wasn't present in the initial MPNST data version I created, but has been noticeable recently (data is from version 0.1.26).

Details:

Gene Expression File unique Sample IDs (mpnst_gene_exp['improve_sample_id']):

{5124, 5125, 5126, 5127, 5128, 5129, 5132, 5133, 5134, 5135, 5136, 5137, 5138, 5139, 5140, 5141, 5142, 5143, 5154, 5155, 5156, 5157, 5158, 5159, 5162, 5163, 5164, 5165, 5166, 5167, 5168, 5169, 5170, 5171, 5172}

Experiments File unique Sample IDs (mpnst_experiments['improve_sample_id']):
```
{5152, 5153, 5144, 5145, 5146, 5147, 5148, 5149, 5150, 5151}
```

Expected Behavior:

Both files should have aligned improve_sample_id values for consistent data analysis and integrity.

Actual Behavior:

The improve_sample_id sets from the transcriptomic and experiment files do not match, indicating potential issues in data alignment or entry.

Steps to Reproduce:

Load the data from both files.
Extract and compare the unique improve_sample_id sets as shown above.
Observe the discrepancy between the two sets.

Additional Context:

This alignment issue may affect data analysis and integrity, requiring a thorough investigation and correction of the sample IDs in the relevant databases.

sgosline commented 2 weeks ago

Jeremy can you please confirm that no other sample files have issues, then rebuild the MPNST data per the build_locally script? You shoudn't have to rebuilt everything.

jjacobson95 commented 2 weeks ago

Looking like there may be some other issues with hcmi and broad_sanger samples files. I'll create separate issues with more information

PNNL-CompBio / coderdata