PoonLab / covizu

Rapid analysis and visualization of coronavirus genome variation
https://filogeneti.ca/CoVizu/
MIT License
46 stars 20 forks source link

cron job died (2020-06-03) #62

Closed ArtPoon closed 4 years ago

ArtPoon commented 4 years ago

In debug/Autobot.log

Error in update_alignment: length of sequence hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01 does not match reference.
ArtPoon commented 4 years ago

Reproduced error manually on Langley:

art@Langley:/home/covid/covizu$ python scripts/update.py -ref data/NC_045512.fa data/GISAID-2020-06-03.fasta data/gisaid-aligned.fa 
Error in update_alignment: length of sequence hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01 does not match reference.
ArtPoon commented 4 years ago

Looks like our alignment file is corrupted with unaligned sequences:

art@Langley:/home/covid/covizu/data$ grep EPI_ISL_457576 gisaid-aligned.fa 
>hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01
art@Langley:/home/covid/covizu/data$ grep EPI_ISL_457576 GISAID-2020-06-03.fasta 
>hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01
art@Langley:/home/covid/covizu/data$ grep -a1 EPI_ISL_457576 gisaid-aligned.fa | tail -n1 | wc
      1       1   28542
art@Langley:/home/covid/covizu/data$ head -n2 gisaid-aligned.fa | tail -n1 | wc
      1       1   29904

Aligned sequence lengths should be 29904 nt.

ArtPoon commented 4 years ago

The good news is that it's only one entry -- looks like a previous run was interrupted somehow

>>> handle = open('gisaid-aligned.fa')
>>> for h, s in iter_fasta(handle):
...   if len(s) != 29903:
...     print(h)
... 
hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01
ArtPoon commented 4 years ago

Confirmed this is the last entry in gisaid-aligned.fa. Manually deleted record. Rerunning updater.py script.

ArtPoon commented 4 years ago

It's been 3 hours and we're about half-way through (1874 new records)