LANL-Bioinformatics / PanGIA

Other
7 stars 2 forks source link

PanGIA is broken: throws an internal `KeyError` when running test command #11

Open xapple opened 1 year ago

xapple commented 1 year ago

I installed PanGIA by cloning this repository and then downloading these two files:

$ curl -O https://edge-dl.lanl.gov/PanGIA/database/PanGIA_20190830_taxonomy.tar.gz
$ curl -O https://edge-dl.lanl.gov/PanGIA/database/PanGIA_20190830_NCBI_genomes_refseq89_BAV.fa.mmi.tar.gz

$ tar xzf PanGIA_20190830_taxonomy.tar.gz
$ tar xzf PanGIA_20190830_NCBI_genomes_refseq89_BAV.fa.mmi.tar.gz

Next, I ran the following command to test if PanGIA could classify a bunch of artificially generated reads:

(pangia) xapple@server ~ $ ~/programs/pangia/bin/pangia.py --threads 4 --database ~/databases/pangia/PanGIA/NCBI_genomes_refseq89_BAV.fa --mode report --outdir ~/runs/pangia_test/ --readmapper minimap2 --prefix sample --input ~/runs/pangia_test/reads_fwd.fastq.gz ~/runs/pangia_test/reads_rev.fastq.gz

But it throws a KeyError and seems to be non-functional.

[00:00:00] Starting PanGIA 1.0.0-RC6.1
[00:00:00] Temporary directory '~/runs/pangia_test//sample_tmp' found. Deleting directory...
[00:04:53] Arguments and dependencies checked:
[00:04:53]     Input reads       : ['~/runs/pangia_test/reads_fwd.fastq.gz', '~/runs/pangia_test/reads_rev.fastq.gz']
[00:04:53]     Input SAM file    : ~/runs/pangia_test//sample.pangia.sam
[00:04:53]     Input background  : None
[00:04:53]     Save background   : None
[00:04:53]     Scoring method    : standalone
[00:04:53]     Scoring parameter : 0.5:0.99
[00:04:53]     Database          : ['~/databases/pangia/PanGIA/NCBI_genomes_refseq89_BAV.fa.mmi']
[00:04:53]     Abundance         : DEPTH_COV
[00:04:53]     Output path       : ~/runs/pangia_test/
[00:04:53]     Prefix            : sample
[00:04:53]     Mode              : report
[00:04:53]     Specific taxid    : None
[00:04:53]     Threads           : 4
[00:04:53]     First #refs in XA : 30
[00:04:53]     Extra NM in XA    : 1
[00:04:53]     Minimal score     : 0
[00:04:53]     Minimal RSNB      : 2.5
[00:04:53]     Minimal reads     : 10
[00:04:53]     Minimal linear len: 200
[00:04:53]     Minimal genome cov: 0.004
[00:04:53]     Minimal depth (DC): 0.01
[00:04:53]     Minimal RSDCnr    : 0.0009
[00:04:53]     Aligner option    : -A1 -B2 -k 40 -m 60 -x sr -p 1 -N 30
[00:04:53]     Aligner seed len  : 40
[00:04:53]     Aligner min score : 60
[00:04:53]     Aligner path      : ~/mambaforge/envs/pangia/bin/minimap2
[00:04:53]     Samtools path     : ~/mambaforge/envs/pangia/bin/samtools
[00:04:53] Loading taxonomy information...
[00:05:00] Done.
[00:05:00] Loading pathogen information...
[00:05:00] Done. 2817 pathogens loaded.
[00:05:00] Loading taxonomic uniqueness information...
[00:05:00] Done. 31177 taxonomic uniqueness loaded.
[00:05:00] Loading sizes of genomes...
[00:05:55] Done. 1061 target and 0 host genome(s) loaded.
[00:05:55] Running read-mapping...
[00:05:55] Mapping to ~/databases/pangia/PanGIA/NCBI_genomes_refseq89_BAV.fa.mmi...
[00:06:53] Done mapping reads to the database(s).
[00:06:53] Merging SAM files...
[00:06:55] Logfile saved to ~/runs/pangia_test//sample.pangia.log.
[00:06:55] Done. Mapped SAM file saved to ~/runs/pangia_test//sample.pangia.sam.
[00:06:55] Total number of input reads: 400013
[00:06:55] Total number of mapped reads: 186478
[00:06:55] Total number of host reads: 0 (0.00%)
[00:06:55] Total number of ignored reads (cross superkingdom): 349 (0.19%)
[00:06:55] Processing SAM file...
[00:06:55] Parsing SAM files with 4 subprocesses...
[00:06:59] Merging results...
[00:06:59] Done.
[00:06:59] Calculating linear length...
[00:07:02] Done processing SAM file, 184670 alignment(s).
[00:07:02] Rolling up taxonomies...
[00:07:02] 17 strain(s) mapped.
Traceback (most recent call last):
  File "~/programs/pangia/bin/pangia.py", line 2320, in <module>
    res_rollup = taxonomyRollUp(res, patho_meta, mapped_r_cnt, argvs.minRsnb, argvs.minReads, argvs.minLen, argvs.minCov, argvs.minDc)
  File "~/programs/pangia/bin/pangia.py", line 1199, in taxonomyRollUp
    genome_size[taxid]
KeyError: '1582156.1'