Closed jfnjdoh closed 5 months ago
Hey @jfnjdoh thanks for the excellent documentation. We are just about to release the new v2.1.0 version of phoenix this week and I added handling for when it can't determine the assembler. Are you able to run the v2.1.0-dev branch -entry SCAFFOLDS and let me know if it get past that step now?
I took a sample that I knew already worked and had contig_1
as the first name, changed it to contig_224
and reran and it worked fine on the dev version. Thanks for the fix.
whoo hoo I love an easy fix. Keep an eye out for the new release at the end of the week. If you want to be included in release emails then email HAISeq@cdc.gov
and you request to be added to the list serve. Happy sequencing!
When using the
SCAFFOLDS
entry point, Phoenix determines the assembler by looking at the name of the first of the entry in the assembly fasta, see here https://github.com/CDCgov/phoenix/blob/main/bin/rename_fasta_headers.py#L124. For flye, it makes the assumption that the name of the first contig is alwayscontig_1
. However, sometimes it is not, and flye's developer said this is normal behavior (https://github.com/fenderglass/Flye/issues/667). Hence, when this happens, the pipeline fails at line 161, being unable to determine the assembler.For now I've just been manually changing the name to
contig_1
but that's a bad solution. A better one might be tocontig_\d+
, though complications might ensue if names are similar between assemblers, but it seems that you'd be ok in this case as long as the c is case sensitive--assembler
and skip those checks if the--assembler
parameter is specified