UBC-Stat-ML / nowellpack

Blang library for cancer genomics
7 stars 3 forks source link

Compatibility with 10X CNV output #22

Open kieranrcampbell opened 6 years ago

kieranrcampbell commented 6 years ago

When uptake of the 10X single-cell CNV system becomes widespread we should ideally support whatever they output to make it as easy as possible for users. We will also need some sort of heuristic alignment of breakpoints if 10X software doesn't produce this.

Andrew McPherson has a joint HMM that segments all the cells simultaneously that seemed to partially solve the alignment issue.

alexandrebouchard commented 6 years ago

Hey Kieran, if you send me data you found during the meeting today I can have a look at adding an option for their data formats.

kieranrcampbell commented 6 years ago

Sure, I need to download them and they are not small (2gb?) so might yank them straight on to azure. Did we get anywhere with Daniel over this? If it’s going to take very long I think we can just use my azure vm, which would work well since it also has the ov2295 data and sa501 both of which we have to run through.

On Fri, Aug 17, 2018 at 20:22 Alexandre Bouchard notifications@github.com wrote:

Hey Kieran, if you send me data you found during the meeting today I can have a look at adding an option for their data formats.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/UBC-Stat-ML/nowellpack/issues/22#issuecomment-414028776, or mute the thread https://github.com/notifications/unsubscribe-auth/AB8ewV10up8KcFKOun2kkS6rbiEJHA_0ks5uR4iGgaJpZM4V0uJ7 .

alexandrebouchard commented 6 years ago

As Kieran mentioned today, 10X has bins that are 4x coarser. Likely to have more events merged together.

kieranrcampbell commented 6 years ago

Reference for this

For an approximately diploid human sample we recommend a sequencing depth of 750,000 read-pairs per cell. At this depth, the metric median effective reads per 1Mbp is between 350-400, and we expect to be able to detect single cell copy number events in the size range 1-2 megabases (and upwards) with high sensitivity and positive predictive value. In groups of 10 or more cells we expect to be able to detect copy number events in the 100-200 kilobase (and upwards) with high sensitivity and positive predictive value.

alexandrebouchard commented 6 years ago

Thanks!