berman-lab / ymap

YMAP - Yeast Mapping Analysis Pipeline : An online pipeline for the analysis of yeast genomic datasets.
MIT License
6 stars 6 forks source link

Auto-detection of baseline ploidy #52

Open vladimirg opened 7 years ago

vladimirg commented 7 years ago

Today, in order to correctly assign the allelic ratios (and assign colors in the SNP diagrams created with a hapmap) the user needs to provide the correct baseline ploidy. This tells Ymap which possible ratios the SNPs can fall into. For example, in a diploid segment, the only possible SNP ratios are 2:0, 1:1, and 0:2. However, if the segment is triploid, then the actual ratios are 3:0, 2:1, 1:2 and 0:3. Only the homozygotes can be correctly assigned a color, and the rest will be either strictly incorrect or appeared smeared.

Many times the user can only guess the baseline ploidy, and sometimes they even incorrectly assume it, and don't look at the allele ratio diagrams to re-run the dataset. So it may be a good idea to introduce an automatic detection of the baseline ploidy.

The working suggestion is this: take all segments with CNV matching the baseline ploidy. Find the peaks of the allele ratios within them. Match the ratios those peaks represent with a ploidy, and use that as the baseline ploidy in the rest of the analysis. So, for example, if we see a peak at 0 and a peak at 0.75, we can assume that the baseline ploidy is 4 (this was actually observed in a strain assumed to be haploid).

@darrenabbey , what do you think?

darrenabbey commented 7 years ago

Some terminology may need to be clarified. "Baseline Ploidy" as originally written is used in figure generation only, while "Experimental Ploidy" (I forget exact use. I'll have to check later) is used to scale normalized CNV data and thus determine theoretical SNP ratio peaks.

On Sep 27, 2016 8:49 AM, "Vladimir Gritsenko" notifications@github.com wrote:

Today, in order to correctly assign the allelic ratios (and assign colors in the SNP diagrams created with a hapmap) the user needs to provide the correct baseline ploidy. This tells Ymap which possible ratios the SNPs can fall into. For example, in a diploid segment, the only possible SNP ratios are 2:0, 1:1, and 0:2. However, if the segment is triploid, then the actual ratios are 3:0, 2:1, 1:2 and 0:3. Only the homozygotes can be correctly assigned a color, and the rest will be either strictly incorrect or appeared smeared.

Many times the user can only guess the baseline ploidy, and sometimes they even incorrectly assume it, and don't look at the allele ratio diagrams to re-run the dataset. So it may be a good idea to introduce an automatic detection of the baseline ploidy.

The working suggestion is this: take all segments with CNV matching the baseline ploidy. Find the peaks of the allele ratios within them. Match the ratios those peaks represent with a ploidy, and use that as the baseline ploidy in the rest of the analysis. So, for example, if we see a peak at 0 and a peak at 0.75, we can assume that the baseline ploidy is 4 (this was actually observed in a strain assumed to be haploid).

@darrenabbey https://github.com/darrenabbey , what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/berman-lab/ymap/issues/52, or mute the thread https://github.com/notifications/unsubscribe-auth/AKPuRH7AIcbseRmp5JZcHDRXx3HxgU4Nks5quR7kgaJpZM4KHsEP .

vladimirg commented 7 years ago

Ah, you're right! But it doesn't change the proposal, and we'd love to hear your thoughts about it.

darrenabbey commented 7 years ago

Yeah. I didn't want to go ahead until I was sure we were speaking the same language. I'll get you a more full response once I get home.

On Sep 27, 2016 5:01 PM, "Vladimir Gritsenko" notifications@github.com wrote:

Ah, you're right! But it doesn't change the proposal, and we'd love to hear your thoughts about it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/berman-lab/ymap/issues/52#issuecomment-250012406, or mute the thread https://github.com/notifications/unsubscribe-auth/AKPuRErw2c46KLtWq8HgRe7HlIoDFyO6ks5quZJOgaJpZM4KHsEP .

darrenabbey commented 7 years ago

I don't like the idea of automatically over-riding the user inputed experimental ploidy. However, such an analysis could be very helpful in that the results could be highlighted in the user interface as possibly requiring a re-analysis due to evidence for a different overall ploidy than what was indicated.

On Tue, Sep 27, 2016 at 6:37 PM, Darren Abbey darrenabbey.1@gmail.com wrote:

Yeah. I didn't want to go ahead until I was sure we were speaking the same language. I'll get you a more full response once I get home.

On Sep 27, 2016 5:01 PM, "Vladimir Gritsenko" notifications@github.com wrote:

Ah, you're right! But it doesn't change the proposal, and we'd love to hear your thoughts about it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/berman-lab/ymap/issues/52#issuecomment-250012406, or mute the thread https://github.com/notifications/unsubscribe-auth/AKPuRErw2c46KLtWq8HgRe7HlIoDFyO6ks5quZJOgaJpZM4KHsEP .

vladimirg commented 7 years ago

We were thinking about having a combo box with two options: "auto-detect" and "manual". In manual mode the user can enter the ploidies as before, and the code will run as before. In "auto-detect" mode, the experimental ploidy is auto-detected and also used as the baseline for the figure.

If the auto-detection works well, it seems to me that it should be the default. So the question is, can our proposed strategy be expected to work?