Closed ArtPoon closed 4 years ago
Reproduced error manually on Langley:
art@Langley:/home/covid/covizu$ python scripts/update.py -ref data/NC_045512.fa data/GISAID-2020-06-03.fasta data/gisaid-aligned.fa
Error in update_alignment: length of sequence hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01 does not match reference.
Looks like our alignment file is corrupted with unaligned sequences:
art@Langley:/home/covid/covizu/data$ grep EPI_ISL_457576 gisaid-aligned.fa
>hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01
art@Langley:/home/covid/covizu/data$ grep EPI_ISL_457576 GISAID-2020-06-03.fasta
>hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01
art@Langley:/home/covid/covizu/data$ grep -a1 EPI_ISL_457576 gisaid-aligned.fa | tail -n1 | wc
1 1 28542
art@Langley:/home/covid/covizu/data$ head -n2 gisaid-aligned.fa | tail -n1 | wc
1 1 29904
Aligned sequence lengths should be 29904
nt.
The good news is that it's only one entry -- looks like a previous run was interrupted somehow
>>> handle = open('gisaid-aligned.fa')
>>> for h, s in iter_fasta(handle):
... if len(s) != 29903:
... print(h)
...
hCoV-19/England/NOTT-11152A/2020|EPI_ISL_457576|2020-05-01
Confirmed this is the last entry in gisaid-aligned.fa
. Manually deleted record.
Rerunning updater.py
script.
It's been 3 hours and we're about half-way through (1874 new records)
In
debug/Autobot.log