VCCRI / Sierra

Discover differential transcript usage from polyA-captured single cell RNA-seq data
GNU General Public License v3.0
49 stars 17 forks source link

issues with generating splice junction file #60

Open ckong1806 opened 1 year ago

ckong1806 commented 1 year ago

I have 3' scRNAseq data from 10X genomics. I ran the data through cellranger so, I have the possorted bam ouput file. I tried to run regtools on the bam file but, the bed output file has ? in the strand column. Regtools wasn't able to detect strandedness since cellranger used STAR for alignment and STAR doesn't infer strandedness. From the vignette in Sierra, it seems you guys also used cellranger output - can you share how were you able to run regtools on your dataset?

SebastianMHJohn commented 1 year ago

Hi Sierr-Team, I am having the exact same issue und would appreciate any information. Best wishes

rj-patrick commented 1 year ago

How have you run RegTools? Unless there's been an update, setting the -s parameter to 1 should work for 10x data. See the commands we use at: https://github.com/VCCRI/Sierra/wiki/Sierra-Vignette#splice-junctions-file

SebastianMHJohn commented 1 year ago

Hi, I did it exactly as described in https://github.com/VCCRI/Sierra/wiki/Sierra-Vignette#splice-junctions-file. Best wishes

ckong1806 commented 1 year ago

How have you run RegTools? Unless there's been an update, setting the -s parameter to 1 should work for 10x data. See the commands we use at: https://github.com/VCCRI/Sierra/wiki/Sierra-Vignette#splice-junctions-file

Hi, I ran the code exactly as the vignette but, when I opened the regtools output file, the strand column is filled with only question marks. Regtools also currently do not accept -s 1 as a parameter anymore. It's either RF (first strand) or FR (second strand) but, based on 10X Genomics website, STAR alignment does not include any strand info. I think that may be why regtools output is ? on the strand column. Did you guys have a different regtools output file (strand column) from 10X data? Was it all - or + on the strand column?

rj-patrick commented 1 year ago

Thanks, I wasn't aware that RegTools had made that change, I'll update that wiki. If you specify -s RF that should work then - at least I was able to re-run it on an old BAM file and it appeared to replicate the old junctions file output. The strandedness refers to the library prep strategy, rather than STAR alignment, so for 10X you can use RF.

ckong1806 commented 1 year ago

Thanks, I wasn't aware that RegTools had made that change, I'll update that wiki. If you specify -s RF that should work then - at least I was able to re-run it on an old BAM file and it appeared to replicate the old junctions file output. The strandedness refers to the library prep strategy, rather than STAR alignment, so for 10X you can use RF.

I tried using -s RF and the regtools output file does have - on the strand column but many still have ?

How do you recommend I proceed? Should I filter out the ? rows?

rj-patrick commented 1 year ago

Sorry for the slow response getting back to you. You should be fine to use that junction file for FindPeaks as it is, we don't utilise the strand column.