arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
322 stars 76 forks source link

rgi main : problem with an assembled genome sequence #109

Closed walshaw closed 4 years ago

walshaw commented 4 years ago

I recently installed RGI via bioconda. I can't find a version command for rgi itself, but I'm guessing this must be version 5.1.0.

I tried out RGI on a 3rd-party genome sequence from NCBI (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/195/955/GCF_000195955.2_ASM19595v2/GCF_000195955.2_ASM19595v2_genomic.fna.gz)

$ rgi main --input_sequence GCF_000195955.2_ASM19595v2_genomic.fna \
   --output_file MtuberculosisH37Rv_thrds32_blast --input_type contig --data chromosome \
   --local --clean --num_threads 32

Stderr shows:

Traceback (most recent call last):
  File "($HOME)/miniconda3/envs/card_rgi/bin/rgi", line 4, in <module>
    MainBase()
  File "($HOME)/miniconda3/envs/card_rgi/lib/python3.6/site-packages/app/MainBase.py", line 81, in __init__
    getattr(self, args.command)()
  File "($HOME)/miniconda3/envs/card_rgi/lib/python3.6/site-packages/app/MainBase.py", line 86, in main
    self.main_run(args)
  File "($HOME)/miniconda3/envs/card_rgi/lib/python3.6/site-packages/app/MainBase.py", line 120, in main_run
    rgi_obj.run()
  File "($HOME)/miniconda3/envs/card_rgi/lib/python3.6/site-packages/app/RGI.py", line 197, in run
    self.create_databases()
  File "($HOME)/miniconda3/envs/card_rgi/lib/python3.6/site-packages/app/RGI.py", line 191, in create_databases
    db_obj.build_databases()
  File "($HOME)/miniconda3/envs/card_rgi/lib/python3.6/site-packages/app/Database.py", line 25, in build_databases
    self.write_fasta_from_json_rna()
  File "($HOME)/miniconda3/envs/card_rgi/lib/python3.6/site-packages/app/Database.py", line 166, in write_fasta_from_json_rna
    snpList = [j[i]['model_param']['snp']['param_value'][k] for k in j[i]['model_param']['snp']['param_value']]
KeyError: 'snp'

Background:

I've been using the README.rst as reference.

Prior to the above, I'd run the following commands (plus some for downloads/unpacking, omitted here). I'd loaded the wildcard and kmer datasets because I intend to use rgi bwt on unassembled metagenomes later. Would that cause a problem when applying it to assembled data?

$ rgi load --card_json ../CARD_2020-03-02/card.json --local

$ rgi card_annotation -i ../CARD_2020-03-02/card.json > card_annotation.log 2>&1

$ rgi load -i ../CARD_2020-03-02/card.json --card_annotation card_database_v3.0.8.fasta --local

$ rgi wildcard_annotation -i wildcard_uncompressed \
    --card_json ../CARD_2020-03-02/card.json -v 3.0.6 \
    > wildcard_annotation_try2.log 2>&1

$ rgi load --wildcard_annotation wildcard_database_v3.0.6.fasta \
  --wildcard_index wildcard_uncompressed/index-for-model-sequences.txt \
  --card_annotation card_database_v3.0.8.fasta --local

$ rgi load --kmer_database wildcard_uncompressed/61_kmer_db.json \
  --amr_kmers wildcard_uncompressed/all_amr_61mers.txt \
  --kmer_size 61 --local --debug > kmer_load.61.log 2>&1

`

BTW, re rgi wildcard_annotation there is a minor issue with the card-prevalence.tar.bz2 file - not only is the tar archive compressed, but I think unlike the other distributed tarballs, each of the files within is also gzipped - without uncompressing those first, rgi wildcard_annotation gives an error. I don't think this is mentioned in the README.rst.)

One other thing which I don't know is relevant:

$ rgi database -v --local --all
card_canonical: 3.0.8 | card_variants: None | kmer_sizes: 61

I was expecting a version number to appear in card_variants?

OS: Debian (Jessie).

huynhw1 commented 4 years ago

Database.py.zip

Hey @walshaw ,

This is an issue with the CARD json wherein model 3745 for Tuberculosis does not have an SNP, resulting in RGI throwing an error. We will look further into this entry.

If Tuberculosis is not of your concern, then this can be ignored. However, seeing your error, this error causes the operation to end so attached to this message is an ad-hoc fix wrapping the problem code with a try; except block. This error will be bypasses so that you can get your data.

Please download this zip. Unzip it and place it into the app folder within your rig-5.1.0 directory. Select replace and reinstall RGI via command line.

You can do this by using the command line to enter the rig-5.1.0 directory and using:

pip install .

or

pip3 install .

Whichever your device supports.

Try to run your data again and let me know. It ran successfully on my end.

Cheers.


William Huynh, CARD Curator-Developer

raphenya commented 4 years ago

@walshaw we pushed this commit to resolve this issue. We are working on releasing a new version for RGI. In the meantime please download and install the latest commit from this repository. Cheers.