getzlab / deTiN

DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.
BSD 3-Clause "New" or "Revised" License
49 stars 21 forks source link

Validation data on SRA #29

Closed probalica19 closed 3 years ago

probalica19 commented 3 years ago

Hi,

I would like to do some benchmarking with the validation data you have generated and shared through SRA PRJNA422575.

Could you please provide more information what exactly do the Library Names mean?

Thank you very much in advance and thank you for this amazing data!

probalica

amarotaylor commented 3 years ago

Hi probalica,

HCC refers to the cell line. TiN refers to tumor in normal. The reason for the different names is that some of these (HCC-prefixed) were generated as part of a benchmarking study done previously to the development of DeTiN and some of them (TiN-prefixed) were created specifically for DeTiN benchmarking. I believe 70_30 corresponds to 70% Tumor and 30% Normal though I am no longer at the broad and can't access this data / my old compute env to check. TiN libraries are more clearly named TiN_12_5 corresponds to a sample that was mixed to be 12.5% tumor 87.5% Normal.

For all samples purity is expected to be 100% in that the specific mixture directly corresponds the DNA. However as was noted in the manuscript -- this cell line is multi-clonal such that the "purity" for a specific subclone (as is the case in natural samples) wont be exactly matched and some clones are shared between tumor and normal lineages (making some events difficult to completely characterize). Hope that helps.

Best Amaro

probalica19 commented 3 years ago

Dear Amaro,

Thank you very much for your answer, this was very helpful.

I still have one more question - in the paper it is mentioned that there were two different simulated datasets in sillico and in vitro. As I understood, in sillico corresponds to downsampling and mixing tumor and normal BAM files from HCC1143, while in vitro corresponds to mixing cell lines in laboratory conditions and sequencing of such mixtures.

Could you please explain to which of these two experiments does data in PRJNA422575 correspond to? Is it the same for samples named 'TiN' and 'HCC', mentioned in the previous comments?

Thank you very much in advance!

Kind regards, probalica