Closed BioinfGuru closed 10 months ago
Correct.
Unfortunately, what is going on here is that the featureCount program has changed its behavior via an ill-informed decision.
Starting with a specific version, one needs to pass both the -p
and --countReadPairs
flags, but before that version, only -p
was needed for the same behavior, and the presence of the second flag raised an error.
Even as of last year the installation process installed the older version of featureCounts so I chose the first form. Now it seems the updated version gets installed so the second, two flag form will be required.
I will test the various installation methods and will make the change soon.
I made the changes in the book, thanks for reporting and reminding me to make this change.
Nice job noticing it.
You are well on your way to bioinfoguru-ness, the bioinformatics world is full of inconsistencies!
I just realized that I did not explain the effect itself,
when we run a single-end sequencing each read corresponds to a transcript fragment
when we run a paired-end sequencing each transcript fragment produces two reads.
Note that in the second case two measurements come from a single fragment. Hence during paired-end sequencing at the same coverage, only half as many independent transcripts will be sampled. This is the reason why the counts are half as much.
This is to say that paired-end sequencing is disadvantageous in any situation where we are counting reads since we lose half the data. We might gain more mapping accuracy - though that is debatable - but the net effect is losing half the coverage and we lose a lot of statistical power. So in general paired-end RNA-Seq is not advisable.
the only time it paired-end RNA-Seq makes sense is when we are assembling transcripts, for all other cases it leads to coverage loss
Thanks for that, cheers.
So for well annotated genomes, there's really no need for paired-end, especially if the goal is differential expression. Makes perfect sense. Thanks.
Hi,
First, apologies if this issue has already been raised, I haven't found it.
In the section "How to count features", the commands should include --countReadPairs.
When I ran the commands as shown in the book, all counts are approximately double those shown in the example results image.
So the following command:
cat ids.txt | parallel -j 1 echo "bam/{}.bam" | xargs featureCounts -p -a refs/features.gff -o counts.txt
Should be:
cat ids.txt | parallel -j 1 echo "bam/{}.bam" | xargs featureCounts -p --countReadPairs -a refs/features.gff -o counts.txt
I don't understand enough about featureCounts. Could anyone explain why --countReadPairs has this effect?
Thanks for the great book
Regards, Bioinfguru