UPHL-BioNGS / Cecret

Reference-based consensus creation
MIT License
44 stars 22 forks source link

Combined fasta Issue #192

Closed whottel closed 12 months ago

whottel commented 1 year ago

Hello,

I was running cecret on a collection of sequences, and it kept failing on the vadr step. Tracking down the error file in the nextflow work directory: Error.txt

Checking the concatenated fasta file (coverted to .txt here): ultimate_fasta.txt

There is a blank row under what seems to be the offending sequence ">Consensus_VA-0266-IA-VH01284-230609__S85.consensus_threshold_0.6_quality_20" I am not sure why this occurs as on the same run there were other samples with virtually empty fastq files that were excluded from this concatenated fasta file.

Strangely when I run on a small subset that still includes this sequence, it no longer shows up in the concatenated fasta files and the pipeline runs without error. combined.txt

-Wes

erinyoung commented 1 year ago

That is strange that it the empty fasta was incorporated into ultimate_fasta.fasta. Hmm.... what version of Cecret are you using?

I may need to add an additional check into fasta size.

whottel commented 1 year ago

This was run with v.3.7.20230620

erinyoung commented 1 year ago

Is it possible to share the fasta file that is causing issues? I'd like to include it in testing.

whottel commented 1 year ago

This is the fasta file that is produced by Cecret for that sample. (Converted to .txt to upload) VA-0266-IA-VH01284-230609__S85.consensus.fa.txt

erinyoung commented 1 year ago

Thank you! Hopefully I'll have a fix by COB tomorrow! (If not, it should be soon)

whottel commented 1 year ago

Awesome, thanks!

erinyoung commented 1 year ago

My apologies for taking so long. I think I have an additional fix.

Once https://github.com/UPHL-BioNGS/Cecret/pull/194 finishes testing, I'll be able to merge this into master and get another update out.

whottel commented 12 months ago

Thanks, it looks like the issue is resolved with version 3.7.20230711.