Closed jjacobson95 closed 2 months ago
jeremy are you able to fix this and push a new version?
I can't repro it on my end so please go ahead and fix so you can unblock yourself.
Haven't worked on the MPNST dataset before so it might take a bit to track down, but yes, will do.
OK. the other option is to wait until I can get to it, in which case you can hold off on charging until I fix.
pip install coderdata
coderdata download --prefix mpnst
cat mpnst_samples.csv | cut -f 8 -d , |sort|uniq
gunzip mpnst_transcriptomics.csv.gz; cat mpnst_transcriptomics.csv | cut -f 3 -d , |sort |uniq
the samples overlap.
I can work on this or continue with the cross analysis for the other comparisons. I think/hope this was the only data related issue, but there are other bugs I'm working through with the transfer learning code.
Just let me know what I should prioritize.
No overlap here from what I see -
cat mpnst_transcriptomics.csv | cut -f 3 -d , |sort | uniq
cat mpnst_experiments.tsv | cut -f 2 | sort | uniq
Yes, the transcriptomics was performed on the tumor samples, the drug data on the PDX-MT samples, so you have to match patient data by common name.
Ah I see, so this is the correct behavior then. There is no information or code on how this was handled in the transfer learning pipeline so I'll work on building this mapping into the code.
I am currently blocked on part of the transfer learning analysis with the MPNST Data. The train/test split is failing as the input dataframe is empty due to the following issue.
There are no overlapping improve_sample_ids between mpnst_transcriptomics and mpnst_experiments - ie: no experiments/drugs map to transcriptomics.
This can be reproduced by pulling the latest data (0.1.40) and checking the intersection between these data types.
This issue is not prevalent in the other datasets.