Closed wpearman1996 closed 3 weeks ago
Hello William,
Thanks for letting us know!
This error arises, because CRABS expects an accession number at the end of the header when the BOLD header contains 3 |
. However, this Pending (#####)
seems to be an exception. This in turn seems to run into issues when invoking VSEARCH during --pairwise-global-alignment
, probably because of the presence of ()
. In the new version CRABS v 1.0.6, I've updated the BOLD parsing during --import
. It will now check if the information is an actual accession number and provide a proper sequence ID if this is not the case.
You will need to re-import your original files that you downloaded from BOLD, but it should all work now :)
Please let me know if the issue persists.
Thanks, Gert-Jan
Hi GJ, I've been working on generating my databases and when i was running
../crabs --pairwise-global-alignment --input merged_filter_derep_10k.txt --amplicons P5_insilico_derep_4aln.txt --output P5_aligned.txt --forward GGCATCACCATACTACTAACAGACCG --reverse GGATTAGGATGTAGACTTCTGGGTG --percent-identity 0.9 --coverage 0.9
I ended up with the following error:
| Function | Retrieve amplicons without primer-binding regions | Import data | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 | Transform to fasta | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 | Pairwise alignment | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- 0:00:01 |Parse alignment data | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 100% 0:00:01 0:00:00 Traceback (most recent call last): File "/scale_wlg_nobackup/filesets/nobackup/XXXX/reference_database_creator/./crabs", line 847, in <module> crabs() File "/opt/nesi/CS400_centos7_bdw/Python/3.11.3-gimkl-2022a/lib/python3.11/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/wpea186/.local/Python-3.11-gimkl-2022a/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/nesi/CS400_centos7_bdw/Python/3.11.3-gimkl-2022a/lib/python3.11/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nesi/CS400_centos7_bdw/Python/3.11.3-gimkl-2022a/lib/python3.11/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/scale_wlg_nobackup/filesets/nobackup/XXXX/reference_database_creator/./crabs", line 640, in crabs amplicon_fasta_dict = extract_alignment_results(console, columns, align_temp_path, amplicon_fasta_dict, include_all_start_positions_, coverage_, forward_, reverse_, raw_fasta_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/scale_wlg_nobackup/filesets/nobackup/XXXX/reference_database_creator/function/crabs_functions.py", line 1256, in extract_alignment_results sequence = raw_fasta_dict[query]['sequence'] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
I tracked it down to a line in my merged database which contained "Pending (#8627) Acanthurus albimento" is the first column.
I checked, and this seems to originate from the bold database where many sequences are formatted as follows:
I'm going to have a go at removing all text that contains "Pending" from the fasta file - so it reformats the names to something like "NTKAT1115-20|Knipowitschia mermere|COI-5P" - and will let you know how that works, but thought it might be helpful to let you know here too.
Incidentally, i'm doing alignments to a P5 primer set - so it makes sense why i've only encountered this error here thus far Cheers, William