Closed Hemantcnaik closed 2 years ago
Yes definitely, the genome preparation should be something like this:
SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/--strain 129S1_SvImJ --strain2 CAST_EiJ
Wait a while, then index the N-masked genome with your favourite aligner, and let's go!
Thank you for the quick reply SNP file what we get is it needed to modify anything. because its a hybrid cross so we have to consider heterozygous SNPs?
If you take the SNP file provided by the Mouse Genomes Project, you can generate a dual hybrid cross as in a two-step process:
This way you do not need to consider heterozygous SNPs.
Thanks but my data is not dual hybrid F1 is CAST vs 129S1 still I have to do the same ? and I have one doubt how SNP split handle the reads which contains multiple SNPs?
In the terminology we use, as a hybrid strain would be anything crossed back to the standard genome reference sequence (also called Black6 (C57BL/6)). A dual hybrid would be any hybrid strain that has 2 genetic backgrounds different from Black6.
As you've got 129S1 as one background, and CAST as the other background, this is what SNPsplit would call a dual hybrid.
If a read contains more than 1 SNP, SNPsplit evaluates if all SNPs in the read belong to one or the other strain and sorts accordingly. In the rare case that a read contains SNPs for both strains, the read is considered conflicting
, and suppressed by default (you can report them with a flag if needed, but this isn't recommended).
Thanks for the Clarification. As you mentioned for the dual hybrid(CAST vs 129S1) below command I have to use it right for genome preparation and which file I have to use for SNP_split? because in these two steps I will get two SNPs containing files.
SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/ --strain 129S1_SvImJ
SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/Nmasked_129S1/ --strain 129S1_SvImJ --strain2 CAST_EiJ
What about the comment you made what will be the issue when I use it can you please clarify because I have generated the genome using that command and ran SNP_split with all_129S1_SvImJ_SNPs_CAST_EiJ_reference.based_on_GRCm38.txt file
SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/--strain 129S1_SvImJ --strain2 CAST_EiJ
Thank you
No don't do that. Just use the command:
SNPsplit_genome_preparation --vcf mgp.v5.merged.snps_all.dbSNP142.vcf.gz --reference /.../Genomes/Mouse/GRCm38/--strain 129S1_SvImJ --strain2 CAST_EiJ
and then use all_CAST_EiJ_SNPs_129S1_SvImJ_reference.based_on_GRCm38.txt
for SNPsplit.
There is no need to re-name and concatenate anything, SNPsplit will do just the thing. You only need to find index the N-mask dual hybrid genome, and that's it.
Okay thanks
22268179 reads were unassignable (83.55%) 2238117 reads were specific for genome 1 (8.40%) 2142823 reads were specific for genome 2 (8.04%)
I am getting this result in almost all my sample is it too low or expected can you please suggest me
The number of allele-specific reads depends very much on:
I've seen better, but I've also seen worse...
Hello,
Thanks for the tool once again its been while I am using SNPsplit I have doubt on. I have hybrid cross data CAST vs 129S1 for this cross SNP split can be used?