RIVM-bioinformatics / AmpliGone

A tool in order to accurately remove primer sequences from NGS reads in an amplicon experiment
https://rivm-bioinformatics.github.io/AmpliGone/
GNU Affero General Public License v3.0
13 stars 0 forks source link

Support for multiple sequences in the input reference FASTA file #69

Closed BertBog closed 1 year ago

BertBog commented 1 year ago

Dear,

I have encountered an issue when I use AmpliGone for Influenza A datasets. The input FASTA file contains 8 separate segments which results in the following error when I run AmpliGone (v1.2.1): File "/usr/local/bin/lmod/AmpliGone/1.2.1/venv/lib/python3.9/site-packages/AmpliGone/AmpliGone.py", line 260, in main primer_df = TP_PrimerLists.result() File "/usr/lib/python3.9/concurrent/futures/_base.py", line 446, in result return self.__get_result() File "/usr/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/lib/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/bin/lmod/AmpliGone/1.2.1/venv/lib/python3.9/site-packages/AmpliGone/fasta2bed.py", line 69, in MakeCoordinateLists return pd.DataFrame( File "/usr/local/bin/lmod/AmpliGone/1.2.1/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 774, in __init__ data = list(data) File "/usr/local/bin/lmod/AmpliGone/1.2.1/venv/lib/python3.9/site-packages/AmpliGone/fasta2bed.py", line 94, in CoordListGen ref_file = SeqIO.read(referencefile, "fasta") File "/usr/local/bin/lmod/AmpliGone/1.2.1/venv/lib/python3.9/site-packages/Bio/SeqIO/__init__.py", line 659, in read raise ValueError("More than one record found in handle") ValueError: More than one record found in handle

The command that I used: ampligone --reference influenza_a-H3N2.fasta --primers primers.influenza_A.fasta --input sequences.fastq --output sequences_clipped.fastq --threads 4 --amplicon-type fragmented --error-rate 0.1

Would be it possible to resolve this issue? I have obtained great results for SARS-CoV-2 with this tool.

Best regards, Bert

florianzwagemaker commented 1 year ago

Dear Bert,

Thank you for your issue submission. Glad to hear our tool was useful for your SARS-CoV-2 analysis.

The issue that you're describing is a known problem with the current implementations of AmpliGone. I've added it to our backlog to have it solved in the next release (1.4.0) but i currently can't provide a time estimate for this.

We're currently working around this ourselves by simply processing all Influenza segments individually. I do however think that processing all segments at once like you're describing is in-scope of this project and it will be added as soon as i can get to it.

I hope in the meantime you can still continue with your Influenza analyses regardless of this issue.

Best regards, Florian

florianzwagemaker commented 1 year ago

Just a quick update, multi-reference support is added in cd4f4138118482dea683fde4d201a6e4159b37bb

I'm still testing some things and making sure everything works as intended, but hopefully i can release this soon.

florianzwagemaker commented 1 year ago

@BertBog This has now been released in version 1.3.0 This version can already be installed through pip with pip install AmpliGone==1.3.0 Installation through conda for this version should be available somewhere in the next 24 hours as we have to wait for the review/merge from the bioconda team.

If you come across any issues with this new feature please let us know and good luck with your Influenza analyses!

BertBog commented 1 year ago

Thanks a lot for fixing this so quickly! I'll let you know if I experience any issues with the new functionality.