Closed ArtPoon closed 1 year ago
Could this be related to recent updates to the lineage.csv
file?
Failed August 23 run:
Entering 'covizu/data/pango-designation'
Updating 2f12f2f..fcad365
Fast-forward
deduplicate_keeping_last.py | 38 ++
lineage_notes.txt | 15 +
lineages.csv | 737 ++++++++++++++++++++++++++++++++++++---
pango_designation/__init__.py | 2 +-
pango_designation/alias_key.json | 5 +-
5 files changed, 748 insertions(+), 49 deletions(-)
Failed August 26 run:
Entering 'covizu/data/pango-designation'
Updating fcad365..3c27f23
Fast-forward
lineage_notes.txt | 6 +
lineages.csv | 279 +++++++++++++++++++++++++++++++++++++--
pango_designation/alias_key.json | 4 +-
Maybe related to this commit? https://github.com/cov-lineages/pango-designation/commit/aa9e72bc76df05b1bdab28a94ba9d1c9c6bd3547 (blank line in middle of file)
And a similar error was patched on Aug 23: https://github.com/cov-lineages/pango-designation/commit/60d1cc75a27ebe5d92435762afe4ffddae3ad68b
We might need to write something to catch these edge cases
Current update got past this step, so we should be ok
Runs crashed again, let's not close this until we can catch this edge case
Screen lineages.csv
file for empty lines and special characters
The line strain userOrOld date Nextstrain_clade pango_lineage genbank_accession country Nextstrain_clade_usher pango_lineage_usher accession
in lineages.csv
is causing this issue
diff --git a/covizu/treetime.py b/covizu/treetime.py
index 7e0c7e1..9a83521 100644
--- a/covizu/treetime.py
+++ b/covizu/treetime.py
@@ -287,8 +287,14 @@ if __name__ == '__main__':
sys.exit()
lineages = {}
for line in handle:
- taxon, lineage = line.strip().split(',')
- lineages.update({taxon: lineage})
+ try:
+ taxon, lineage = line.strip().split(',')
+ if taxon and lineage:
+ lineages.update({taxon: lineage})
+ else:
+ cb.callback("Warning '{}': taxon or lineage is missing".format(line), level='WARN')
+ except:
+ cb.callback("Warning: There is an issue with the line '{}' in lineages.csv".format(line), level='WARN')
cb.callback("Identifying lineage representative genomes")
fasta = retrieve_genomes(by_lineage, known_seqs=lineages, ref_file=args.ref, earliest=args.earliest,
diff --git a/covizu/utils/batch_utils.py b/covizu/utils/batch_utils.py
index 6f54e08..27de99d 100644
--- a/covizu/utils/batch_utils.py
+++ b/covizu/utils/batch_utils.py
@@ -51,8 +51,16 @@ def build_timetree(by_lineage, args, callback=None):
sys.exit()
lineages = {}
for line in handle:
- taxon, lineage = line.strip().split(',')
- lineages.update({taxon: lineage})
+ try:
+ taxon, lineage = line.strip().split(',')
+ if taxon and lineage:
+ lineages.update({taxon: lineage})
+ else:
+ if callback:
+ callback("Warning '{}': taxon or lineage is missing".format(line), level='WARN')
+ except:
+ if callback:
+ callback("Warning: There is an issue with the line '{}' in lineages.csv".format(line), level='WARN')
if callback:
callback("Identifying lineage representative genomes")
No jobs currently running on the cluster.
Seems to have stalled on parsing PANGO lineages?