Open dominikburri opened 4 years ago
Hi Dominik,
Thanks for your detailed issue.
We have been having this discussion of integrating paired-end data processing and since both of us (authors) weren't very experienced with this kind of data, we decided not to manage it.
I don't have that much time to deal with this now but your opinion about the topic is interesting. At first glance, my intuition about the source of the problem is the same as yours.
I'm still not used to work with paired-end data but, if you are, what do you think about artificially extending the first mate on its strand up to the beginning of the second mate? Thus we could consider this as a single-end read and count its genomic features. Doing this, we would describe what was really present in the sample at the beginning (the fragment marked by the paired-end reads).
Cheers, Mathieu
Hi Mathieu,
I would need to think a bit more about this question, but artificially generating coverage does not sound right to me.
The procedure I suggested that uses STAR to generate .bedgraph
coverage tracks works. It also has the advantage that coverage on normalised reads (reads per million) can be generated.
You could simply recommend this procedure in your documentation.
On another note: I noticed that you can supply multiple samples with
alfa -g quick_start --bedgraph sample1.out.plus.bg sample1.out.minus.bg sample1 sample2.out.plus.bg sample2.out.minus.bg sample2 -s reverse
and it generates plots with both samples. However, the results of sample2 are exactly the same as for sample1. Maybe you could state more explicitly that only one sample can be used when using --bedgraph
option? Or return an error if multiple samples are supplied.
Best, Dominik
Hi Dominik,
Thanks for your suggestions, I'll take care of that when I have time to deal with that. I'll notify here.
Cheers, Mathieu
When working with stranded, paired end RNA-seq, ALFA does not correctly report reads on categories.
Example
I copied the supplied
quick_start.chr_len.txt
,quick_start.gtf
andquick_start.bam
into a new directory.I adjusted the supplied
quick_start.bam
for a stranded paired end RNA-seq (firstread antisense) by:paired_end.bam
. Such a result can be expected when using STAR for mapping:I genereted alfa_index as described:
alfa -a quick_start.gtf --chr_len quick_start.chr_len.txt
Output with ALFA from .bam
running
alfa -g quick_start --bam paired_end.bam paired_end -s reverse
produces a plot where half of the reads are considered on the opposite strand: ALFA_plots.Categories.pdfGenerating .bedgraph from STAR
When I generate .bedgraph on my own with
STAR --runMode inputAlignmentsFromBAM --inputBAMfile paired_end.bam --outWigType bedGraph --outWigStrand Stranded --outWigNorm None
I get 4 files, of which 2 I need.With this example I get an empty Signal.UniqueMultiple.str1.out.bg which I have to populate with "chr1\t100\t101\t0", so I do not get an error when running ALFA.
Also, I need to rename the files:
Output ALFA from .bedgraph
Now running ALFA from bedgraph with
alfa -g quick_start --bedgraph Signal.UniqueMultiple.out.plus.bg Signal.UniqueMultiple.out.minus.bg paired_end_STAR -s reverse
gives me as result the correct annotation: ALFA_plots.Categories.pdfProblems
alfa.py
when generating bedgraph from bam the parameters for reverse are set such that the minus strand is called plus and vice versa. Why?.bedgraph
is tedious because manual renaming of files is necessary (requirement to have .plus. and .minus. in the filename).Possible source of problem
ALFA uses basically
bedtools genomecov
with option-strand
to generate.bedgraph
. It could be that bedtools does ignore the SAM flags related to mates (bits: 0x40 and 0x80). Therefore read pairs are separated into different strands instead of keeping them on one strand.