Closed nkoussa closed 1 month ago
this shouldn't pass validation, we'll take a look!
Oohh, i remember this. The capitalized terms are from the Sanger institute, and I wanted to retain them and not lose them, but you're right they are different. I will map as follows: Amplification -> amp Deletion -> deep del Loss -> het loss Gain -> gain Neutral -> diploid
It seems like there's inconsistencies in the naming of the copy calls between datasets.
If I run: dataset = coderdata.join_datasets("broad_sanger", "beataml", "hcmi") df_long = dataset.copy_number df_long.copy_call.unique()
I get: array(['amp', 'deep del', 'gain', 'diploid', 'het loss', 'Neutral', 'Loss', 'Gain', 'Deletion', 'Amplification'], dtype=object)
Not a big deal to handle, but just wanted to flag that for you guys!