Open Dries-B opened 10 months ago
Hi Dries, I haven't looked at this rule specifically in a long time. I think that when I first wrote it I had been working with metagenomic samples where a lot of reads mapped to more than one location and I didn't want to disregard these reads as it was likely that they were 'real' due to highly similar genomes in the samples.
You're right about the -B
flag, it may have been added to increase the stringency somewhat.
Regarding -O
: yes but also I think for a metagenomic setting we may in fact want to count reads mapping to entire contigs instead of the features defined in the prodigal GFF file. Then give features on contigs the abundances of reads mapped to entire contigs. This should be handled differently for metatranscriptomics though.
I agree that the settings for this rule could be made more configurable by the user. Easiest may be to simply add a featureCounts:
section to the config file where all flags (except perhaps -p
can be configured):
featureCounts:
settings: "--primary -O"
rule featurecount:
input:
gff=results+"/annotation/{assembly}/final_contigs.features.gff",
bam=results+"/assembly/{assembly}/mapping/{sample}_{unit}_{seq_type}"+POSTPROCESS+".bam"
output:
results+"/assembly/{assembly}/mapping/{sample}_{unit}_{seq_type}.fc.tsv",
results+"/assembly/{assembly}/mapping/{sample}_{unit}_{seq_type}.fc.tsv.summary"
log:
results+"/assembly/{assembly}/mapping/{sample}_{unit}_{seq_type}.fc.log"
threads: 4
params:
tmpdir=config["paths"]["temp"],
p=lambda wildcards: "-p" if wildcards.seq_type == "pe" else "",
setting=config["featureCounts"]["settings"],
resources:
runtime=lambda wildcards, attempt: attempt**2*30
conda:
"../envs/quantify.yml"
shell:
"""
mkdir -p {params.tmpdir}
featureCounts -a {input.gff} -o {output[0]} -t CDS -g gene_id {params.p} \
{params.setting} -T {threads} --tmpDir {params.tmpdir} {input.bam} > {log} 2>&1
"""
Hi John,
Thanks for your reply! I should add that my questions are not necessarily proposals for improvements but rather 'assessments' of your opinion regarding changes in my repository, as well as sharing my understanding of the featureCounts
documentation.
[Regarding
-M
] I think that when I first wrote it I had been working with metagenomic samples where a lot of reads mapped to more than one location and I didn't want to disregard these reads as it was likely that they were 'real' due to highly similar genomes in the samples.
Okay, I understand your reasoning there, although I think -M
could risk duplicating counts compared to --primary
.
I agree that the settings for this rule could be made more configurable by the user. Easiest may be to simply add a featureCounts: section to the config file where all flags (except perhaps -p can be configured)
Indeed this might improve the easy of use. This is not really necessary for my own use however, as I can modify the source code while looking at it.
Regarding -O: yes but also I think for a metagenomic setting we may in fact want to count reads mapping to entire contigs instead of the features defined in the prodigal GFF file. Then give features on contigs the abundances of reads mapped to entire contigs. This should be handled differently for metatranscriptomics though.
Okay, but for such a contig analyses, don't other simpler tools exist? (I myself am specifically interested in the Prodigal output.)
Okay, but for such a contig analyses, don't other simpler tools exist? (I myself am specifically interested in the Prodigal output.)
Sure, something like pileup.sh
for instance.
Currently the
rule featurecount
looks as follows:I have some questions about this:
-M
counts multi-mapping reads, but wouldn't it make more sense to use--primary
instead, which only counts primary alignments?-B
only counts read pairs that have both ends aligned, but won't this risk losing the reads at the ends of contigs?-O
?-f
for using feature-level instead of meta-feature level does not make a difference when analyzing Prodigal output.)-p
for counting fragments instead of reads, because otherwise single-read fragments would count less than paired-read fragments.)threads: 1
could be used, because this rule took very little time for me when running.)