Closed GangLiTarheel closed 1 year ago
Hi @GangLiTarheel
SNPsplit is intended to be used with alignments to an N-masked genome, and you do not seem to have aligned the data to such a genome:
N-containing reads: 0
You need to start with this step: http://felixkrueger.github.io/SNPsplit/genome_prep/genome_preparation/
then subsequent steps should also work.
Hi Felix, Thank you for your quick responses! I just double-checked that I am aligning to an N-masked genome.
Could the STAR parameter affect the SNPsplit? The only parameter might make a difference is:
--outSAMstrandField intronMotif:
This option specifies the output format for the strandness of the alignment. In this case, it will output the intron motif.
My current STAR aligning process:
STAR --runThreadN $core --outSAMstrandField intronMotif --genomeDir $index --readFilesCommand zcat --readFilesIn $input_folder/$sample*gz --outFileNamePrefix $STAR_output_folder/$sample --genomeLoad LoadAndKeep --outReadsUnmapped Fastx
Sorry for the slow response, I am now back...
Regarding the alignment options for STAR, here are some comments about it: http://felixkrueger.github.io/SNPsplit/SNPsplit/specific_comments/#star. Specifically, it mentions that you need to instruct STAR to output the MD:Z:
field (e.g. with: --outSAMattributes NH HI NM MD
), which I could not see in the example you posted above. I wonder if that is the reason?
Did you use the SNPsplit genome preparation to write out N-masked sequences, and did you use those N-masked.fa files for the indexing process? There should be reports (a SNP filtering (e.g. JF1_MsJ_SNP_filtering_report.txt
) and a genome preparation report (e.g. JF1_MsJ_genome_preparation_report.txt
) where you should see if everything worked well):
SNP position summary for strain JF1_MsJ (based on genome build GRCm39)
===========================================================================
Positions read in total: 83212162
21416624 SNP were homozygous. Of these:
20013481 SNP were homozygous and passed high confidence filters and were thus included into the JF1_MsJ genome
Not included into JF1_MsJ genome:
58142404 had the same sequence as the reference
0 had no clearly defined alternative base
3653134 Calls were neither 0/0 (same as reference) or 1/1, 2/2, 3/3 (homozygous SNP)
1403143 were homozygous but the filtering call was low confidence
Printed a single list of all SNPs to >all_SNPs_JF1_MsJ_GRCm39.txt.gz<...
1561209 positions on chromosome 1 were changed to 'N'
1122032 positions on chromosome 10 were changed to 'N'
959608 positions on chromosome 11 were changed to 'N'
851377 positions on chromosome 12 were changed to 'N'
946731 positions on chromosome 13 were changed to 'N'
840234 positions on chromosome 14 were changed to 'N'
904815 positions on chromosome 15 were changed to 'N'
822652 positions on chromosome 16 were changed to 'N'
753564 positions on chromosome 17 were changed to 'N'
790122 positions on chromosome 18 were changed to 'N'
525431 positions on chromosome 19 were changed to 'N'
1374275 positions on chromosome 2 were changed to 'N'
1322050 positions on chromosome 3 were changed to 'N'
1207025 positions on chromosome 4 were changed to 'N'
1270735 positions on chromosome 5 were changed to 'N'
1181995 positions on chromosome 6 were changed to 'N'
1105527 positions on chromosome 7 were changed to 'N'
681900 positions on chromosome 8 were changed to 'N'
984742 positions on chromosome 9 were changed to 'N'
807457 positions on chromosome X were changed to 'N'
Summary
20013481 Ns were newly introduced into the N-masked genome for strain JF1_MsJ in total
Hi Felix, you are absolutely right about the MD string, now it's working! For me to make it work:
--outSAMattributes NH HI NM MD
You can close it now! Thank you for your help!Excellent! Good luck!
Hi I am trying to run SNPsplit on F1 hybrid mice single cell RNA-seq data. For our scRNAseq data, one read only has barcode information, one read has real sequence information used for alignment. See some example bam info:
I think I should use single-end model with SNPsplit.
SNPsplit \ --snp_file "${snp}" \ -o "${outdir}" \ --single_end \ --conflicting
However, it seems that it fails to assign any reads/detect N-containing reads. May I ask if I have to adjust other parameters to run SNPsplit for this kind of data?