FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
51 stars 19 forks source link

Using SNPsplit for single cell data #69

Closed taolincj closed 1 year ago

taolincj commented 1 year ago

Given cellranger is based on STAR a software supported by SNPsplit, I try to use SNPsplit v0.5.0 to deal with 10X scRNAseq data. First, I builded a N-maked genome using SNPsplit_genome_preparation. Then, I run cellranger pipeline (mkref and count) using the N-maked genome in cellranger v7.0.0, resulting in a BAM file. I tried to split the BAM file using SNPsplit v0.5.0;however, no reads were mapped to genome1 and genome2.

I am not sure what I did was right. Can SNPsplit handle 10X scRNAseq data? Do you have any suggestions or comments?

Thank you!

Lin

FelixKrueger commented 1 year ago

That is a very interesting question! I have never tried this myself, but I would assume that the STAR mapped data produced by CellRanger could in theory be split allele-specifically, but there are some considerations:

Just as a heads up, I will be on leave for some time now and will most likely not respond to any queries until my return. Wishing you good luck in the meantime!

taolincj commented 1 year ago

Thank you very much for your prompt response!

According to your suggestion, I have tried to analyse my F1 scRNA-seq data using STAR V2.7.10. I am glad to tell you that the BAM file produced by STAR could be split by SNPsplit in spite of a low mapping rate to genomes 1 and 2 (4.23% and 3.86% in my data). Unfortunately, these BAM files from STAR or SNPsplit could not be further processed by STAR. I get the following error tips:

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 Read Name=@ Read Sequence="" DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Aug 20 21:45:43 ...... FATAL ERROR, exiting

Alex indicates some problem with formatting of the fastq files (https://github.com/alexdobin/STAR/issues/493), but I haven't solved it yet. This problem may be beyond the scope of SNPsplit. Anyway, I am happy to share my experience. Any suggestions or comments are welcomed.

Lin

FelixKrueger commented 1 year ago

Good news. The relatively low rate of allele specific reads can be explained by the fact that 10X is single end sequencing, and theoretically only covers the 3' end of transcripts....

I am currently on holiday, but I might reply in more detail upon my return. I'd be happy to hear how you are solving your formatting issue (not quite sure what the problem is as SNPsplit doesn't produce any FastQ files at all...). Cheers, Felix


From: taolin1994 @.> Sent: Wednesday, August 24, 2022 6:48:51 PM To: FelixKrueger/SNPsplit @.> Cc: Felix Krueger @.>; Comment @.> Subject: Re: [FelixKrueger/SNPsplit] Using SNPsplit for single cell data (Issue #69)

Thank you very much for your prompt response!

According to your suggestion, I have tried to analyse my F1 scRNA-seq data using STAR V2.7.10. I am glad to tell you that the BAM file produced by STAR could be split by SNPsplit in spite of a low mapping rate to genomes 1 and 2 (4.23% and 3.86% in my data). Unfortunately, these BAM files from STAR or SNPsplit could not be further processed by STAR. I get the following error tips:

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 Read Name=@ Read Sequence="" DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Aug 20 21:45:43 ...... FATAL ERROR, exiting

Alex indicates some problem with formatting of the fastq files (alexdobin/STAR#493https://github.com/alexdobin/STAR/issues/493), but I haven't solved it yet. This problem may be beyond the scope of SNPsplit. Anyway, I am happy to share my experience. Any suggestions or comments are welcomed.

Lin

— Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/SNPsplit/issues/69#issuecomment-1225831047, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABMZHLPYT7E7QWVRRLQHOKDV2YY5HANCNFSM565ZZN7Q. You are receiving this because you commented.Message ID: @.***>

taolincj commented 1 year ago

I'm sorry to disturb your holiday. I am glad to tell you that I have solved this problem by adjusting some parameters in STAR. Now the mapping rates of my 10X data are ~9% and ~10%. I tend to believe that the result may be normal and helpful. Thanks for your great tool!

Have a good time.

Lin