FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
51 stars 19 forks source link

SNP split for the hybrid cross two different strains #53

Closed Hemantcnaik closed 2 years ago

Hemantcnaik commented 2 years ago

Hello,

Thanks for the tool once again its been while I am using SNPsplit I have doubt on. I have hybrid cross data CAST vs 129S1 for this cross SNP split can be used?

FelixKrueger commented 2 years ago

Yes definitely, the genome preparation should be something like this:

SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/--strain 129S1_SvImJ --strain2 CAST_EiJ 

Wait a while, then index the N-masked genome with your favourite aligner, and let's go!

Hemantcnaik commented 2 years ago

Thank you for the quick reply SNP file what we get is it needed to modify anything. because its a hybrid cross so we have to consider heterozygous SNPs?

FelixKrueger commented 2 years ago

If you take the SNP file provided by the Mouse Genomes Project, you can generate a dual hybrid cross as in a two-step process:

This way you do not need to consider heterozygous SNPs.

Hemantcnaik commented 2 years ago

Thanks but my data is not dual hybrid F1 is CAST vs 129S1 still I have to do the same ? and I have one doubt how SNP split handle the reads which contains multiple SNPs?

FelixKrueger commented 2 years ago

In the terminology we use, as a hybrid strain would be anything crossed back to the standard genome reference sequence (also called Black6 (C57BL/6)). A dual hybrid would be any hybrid strain that has 2 genetic backgrounds different from Black6.

As you've got 129S1 as one background, and CAST as the other background, this is what SNPsplit would call a dual hybrid.

If a read contains more than 1 SNP, SNPsplit evaluates if all SNPs in the read belong to one or the other strain and sorts accordingly. In the rare case that a read contains SNPs for both strains, the read is considered conflicting, and suppressed by default (you can report them with a flag if needed, but this isn't recommended).

Hemantcnaik commented 2 years ago

Thanks for the Clarification. As you mentioned for the dual hybrid(CAST vs 129S1)  below command I have to use it right for genome preparation and which file I have to use for SNP_split?  because in these two steps I will get two SNPs containing files.

  1. SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/ --strain 129S1_SvImJ

I renamed Nmasked genome folder combined all the *.fa using Cat and using as a reference

  1. SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/Nmasked_129S1/ --strain 129S1_SvImJ --strain2 CAST_EiJ

What about the comment you made what will be the issue when I use it can you please clarify because I have generated the genome using that command and ran SNP_split with all_129S1_SvImJ_SNPs_CAST_EiJ_reference.based_on_GRCm38.txt file SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/--strain 129S1_SvImJ --strain2 CAST_EiJ

Thank you

FelixKrueger commented 2 years ago

No don't do that. Just use the command:

SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/--strain 129S1_SvImJ --strain2 CAST_EiJ

and then use all_CAST_EiJ_SNPs_129S1_SvImJ_reference.based_on_GRCm38.txt for SNPsplit.

There is no need to re-name and concatenate anything, SNPsplit will do just the thing. You only need to find index the N-mask dual hybrid genome, and that's it.

Hemantcnaik commented 2 years ago

Okay thanks

22268179 reads were unassignable (83.55%) 2238117 reads were specific for genome 1 (8.40%) 2142823 reads were specific for genome 2 (8.04%)

I am getting this result in almost all my sample is it too low or expected can you please suggest me

FelixKrueger commented 2 years ago

The number of allele-specific reads depends very much on:

I've seen better, but I've also seen worse...