AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
I noticed unexpected silent bug when using a custom --host for host removal. In summary, even if I provided --host (or --reference) parameters, nextflow/amr++ was not reporting errors, but also not using the correct host for removal. Additionally, it appears that providing a --host_index is mandatory (not optional) when using the host removal functions. As a result, this might mean that the user thinks the pipeline is removing their custom host, but in reality isn't. This can impact interpretation of the data.
I included some reproducible examples below to explain.
I did find a solution (see below), but I just wanted to let you know of this potential hidden bug and provide some suggestions & contributions:
My suggestions are to:
1) Consolidate the documents to remove discrepencies in how-to use the
program (e.g. --host vs usage of --reference)
2) Clarify the use of --host_index either in the params.config file or in the documentation (example here)
3) Modify the code to use --host_index and --index to properly read in and use the null index parameter?
I will send a pull request for my suggestion 1. I am unsure how to address 2 (see what I tried below) I attempted to look into 3 but didn't solve it yet.
Please let me know if I can contribute/test things for this codebase.
Best,
Patricia
--
All these tests are performed using the test data provided in
AMRplusplus/data Nextflow: N E X T F L O W \~ version 23.10.1
The test genomes I used are the default chr21 in /data/host and the bovine and chicken reference genome from NCBI.
I first tested how using --host or --reference influenced the result, to resolve the discrepency in the doc linked above.
Once again the same values are reported. More importantly, the same number of reads mapped + unmapped are reported as when we used the chr21 and the bovine genome, which is unexpected. No errors are reported by Nextflow/AMR++.
In my next series of tests, I'm straight up deleting chr21.fasta from the /data folder and seeing if the command works. Nextflow is now properly reporting an error, and tells me that's it's expecting some index and it's not working! Upon further inspection, I found in theparams.config code these lines telling us to also use the --host_index genome, however this is not reported in the usage.md and configuration.md files.
However, none were giving me the expected results. Nextflow couldn't read the null parameter properly and AMR++ was telling me I needed an index file (so the loop about if param = null --> bwa index doesn't automatically go into effect)
Alternative solution
I don't know if I can/have time to fully troubleshoot the code at the moment, but if I do I will send a pull request. For now, I am running bwa prior to running the pipeline, and simply providing BOTH custom --host_index and --host parameters.
conda activate bwa
# if not done already: conda install -c bioconda bwa
bwa index /staging/ptran5/GCF_002263795.3_ARS-UCD2.0_genomic.fna.gz
conda deactivate
Nextflow amr++ command using custom host but with default sample (S1, S2 and S3) data:
Foremost, thank you for making this pipeline!
I noticed unexpected silent bug when using a custom
--host
for host removal. In summary, even if I provided--host
(or--reference
) parameters, nextflow/amr++ was not reporting errors, but also not using the correct host for removal. Additionally, it appears that providing a--host_index
is mandatory (not optional) when using the host removal functions. As a result, this might mean that the user thinks the pipeline is removing their custom host, but in reality isn't. This can impact interpretation of the data.I included some reproducible examples below to explain.
I did find a solution (see below), but I just wanted to let you know of this potential hidden bug and provide some suggestions & contributions:
My suggestions are to:
1) Consolidate the documents to remove discrepencies in how-to use the program (e.g. --host vs usage of --reference)
2) Clarify the use of
--host_index
either in theparams.config
file or in the documentation (example here)3) Modify the code to use --host_index and --index to properly read in and use the
null
index parameter?I will send a pull request for my suggestion 1. I am unsure how to address 2 (see what I tried below) I attempted to look into 3 but didn't solve it yet.
Please let me know if I can contribute/test things for this codebase.
Best,
Patricia
--
All these tests are performed using the test data provided in
AMRplusplus/data
Nextflow: N E X T F L O W \~ version 23.10.1 The test genomes I used are the default chr21 in/data/host
and the bovine and chicken reference genome from NCBI.I first tested how using --host or --reference influenced the result, to resolve the discrepency in the doc linked above.
Control:
Removing bovine genome host using
--host
Removing bovine genome host using
--reference
Results
All worked without printing any errors, but the outcomes are unexpected: all report the same number of mapped & unmapped reads.
Second strategy: changing the host
To test whether this was caused somehow by the genome of choice, I tested using the chicken genome as a host.
Results
Once again the same values are reported. More importantly, the same number of reads mapped + unmapped are reported as when we used the chr21 and the bovine genome, which is unexpected. No errors are reported by Nextflow/AMR++.
Third test:
To test whether this was due to the
--pipeline rm_host
function, I decided to run--pipeline standard_AMR
instead.Results
Unfortunately, we get the same numbers:
Fourth test:
In my next series of tests, I'm straight up deleting chr21.fasta from the
/data
folder and seeing if the command works. Nextflow is now properly reporting an error, and tells me that's it's expecting some index and it's not working! Upon further inspection, I found in theparams.config
code these lines telling us to also use the--host_index
genome, however this is not reported in theusage.md
andconfiguration.md
files.To put that in practice, I tested:
However, none were giving me the expected results. Nextflow couldn't read the
null
parameter properly and AMR++ was telling me I needed an index file (so the loop about if param = null --> bwa index doesn't automatically go into effect)Alternative solution
I don't know if I can/have time to fully troubleshoot the code at the moment, but if I do I will send a pull request. For now, I am running
bwa
prior to running the pipeline, and simply providing BOTH custom--host_index
and--host
parameters.Nextflow amr++ command using custom host but with default sample (S1, S2 and S3) data:
Successful results: