UPHL-BioNGS / Cecret

Reference-based consensus creation
MIT License
44 stars 22 forks source link

Extra lines in summary file with large number of samples #344

Open garfinjm opened 3 days ago

garfinjm commented 3 days ago

Hello again!

I'm encountering an issue where extra lines are added to the cecret_results.csv file when a large number (hundreds) of samples are run together.

Basically, I get two extra lines in my cecret_results.csv that look like:

name,name,,,p/f,,,,,,,,model,alerts,,,,,,,,,,,,,,,,,,,,,,v3.14.240610,artic 1.2.4 seq,seq,,,best,,,,,,,,seq,,,,,,,,,,,,,,,,,,,,,,,v3.14.240610,artic 1.2.4

This only seems to happen when I run large batches of samples through together. I've tracked these extra lines back to the vadr.sqa file used during the cecret summary process (see: https://github.com/ncbi/vadr/issues/81)

I think one possible fix would be to edit the line below replacing grep -v "#-" with grep -v "^#"" (as Eric suggests in the vadr issue).

https://github.com/UPHL-BioNGS/Cecret/blob/afe58161be62c956e1999f1a655764b60035ac09/modules/cecret.nf#L122

Let me know if you need any other info or want me to try something else on my setup, and thanks again for all your work on Cecret.

erinyoung commented 3 days ago

Oh SNAP!