genomic-medicine-sweden / jasen

Bacterial typing pipeline for clinical NGS data. Written in NextFlow, Python & Bash.
https://jasen.readthedocs.io/en/latest/
GNU General Public License v3.0
9 stars 11 forks source link

error_corr_assembly.pl fails with some fasta headers #241

Closed samuell closed 10 months ago

samuell commented 10 months ago

I got the following error in a pipeline:

$ <snip>/jasen/bin/error_corr_assembly.pl somefile.fasta somefile.vcf 
Use of uninitialized value in substr at <snip>/jasen/bin/error_corr_assembly.pl line 23.
substr outside of string at <snip>/jasen/bin/error_corr_assembly.pl line 23.

Debugging in the work folder revealed that the first header in the FASTA file looked like this:

>Contig_1_1015.5_Circ [topology=circular]

After changing it to:

>Contig_1_1015.5_Circ

... the command ran without errors.

samuell commented 10 months ago

Turns out that the IDs in the VCF file is not including any part of the header with or after a space, which meant that the ID would not be found if (as the .pl script does) including the full header as ID.

Not sure if this should be fixed in the upstream tool that produces the VCF, or in error_corr_assembly.pl. What is the preferred behavior here?