Closed diekhans closed 11 months ago
It's not trying to create bed but STAR junction format. Looking into why that doesn't get read properly...
I am so silly, thinking it produced BED, given the name and the documentation. I shouldn't be so naive!
It does "guess" (another ticket) that is a STAR a file, but then discards them all because of something to do with strand.
Jeltje @.***> writes:
It's not trying to create bed but STAR junction format. Looking into why that doesn't get read properly...
-- Reply to this email directly or view it on GitHub: https://github.com/BrooksLabUCSC/flair/issues/272#issuecomment-1692270815 You are receiving this because you authored the thread.
Message ID: @.***> It's not trying to create bed but STAR junction format. Looking into why that doesn't get read properly...
-- Reply to this email directly, [1]view it on GitHub, or [2]unsubscribe. You are receiving this because you authored the thread. Message ID: @.***>
References
Ah yes, the program name does not claim bed, it is the doc and output file name *_junctions.bed - Junctions in BED format
The reason the result the star junction files are ignored is in ssCorrect.py is in addOtherJuncs(); Here the code examines the strand
if starOffset:
if strand == "1": strand = "+"
elif strand == "2": strand = "-"
else:
continue
and all of the junctions_from_sam output has a strand of `0'
:-(
The root cause of this problem is that the pipeline that aligned the short reads did not output the XS tag:
STAR --runThreadN 8 --genomeDir genomeDir --readFilesIn H02C_8986_GATCAG_read1.fastq.gz H02C_8986_GATCAG_read2.fastq.gz --readFilesCommand pigz -p4 -dc --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outSAMattributes NH HI AS NM MD --outSAMunmapped Within --outSAMattrRGline ID:H02C_8986_GATCAG PU:H02C_8986_GATCAG SM:H0CRep1 PL:ILLUMINA CN:CRG --outFilterType BySJout --outFilterMultimapNmax 10 --outFilterMismatchNmax 999 -
so junctions_from_sam creates a not-really-a-bed file with no strand and no error and then ssCorrect ignores them all.
Should be fixed.
junctions_from_sam creates bed file like:
note, these lack strand.
ssCorrect.py then hallucinates STAR junction files because the name is '0' and silently rejects them. In the end, no splice junctions are added as evidence.
This fact that no splice junctions are added is lost in the stream of unhelpful progress messages.
Essentially, if the documentation is followed, short reads junctions don't work