GDKO / AvP

Automatic evaluation of HGTs
GNU General Public License v3.0
18 stars 2 forks source link

TypeError: can only concatenate str (not "list") to str #2

Closed CongLiu37 closed 1 year ago

CongLiu37 commented 1 year ago

Hello,

Thank you for your quick response, and here is another error...

Traceback (most recent call last):
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/avp", line 6, in <module>
    main()
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/depot/interface.py", line 29, in main
    prepare.main()
  File "/home/c/c-liu/miniconda3/envs/avp/AvP/depot/prepare.py", line 219, in main
    blastdbcmd_command = 'blastdbcmd -db '+ config_opts["nr_db_path"] + ' -dbtype prot -entry_batch ' +  extract_id_path + ' -target_only -outfmt ">%a@%T\n%s" -logfile ' + setnrlog_path + ' -out ' + setnrfa_path
TypeError: can only concatenate str (not "list") to str

and here is my config file:

---
max_threads: 64

# DB path
nr_db_path: [/apps/unit/BioinfoUgrp/DB/diamondDB/ncbi/238/nr]

## Algorithm options
# prepare
ai_cutoff: 0
percent_identity: 100
cutoffextend: 20    # when toi hit is found, we take this hit + n hits
trimal: false
min_num_hits: 4   # select queries with at least that many blast hits
percentage_similar_hits: 0.7  # group queries based on this
mode: nr    # use nr for nr database, use sp for swissprot database
# detect, clasify, evaluate
fastml: true  # Use fasttree instead of IQTree
node_support: 0  # nodes below that number will collapse
complex_per_toi: 20   # if H/(H+T) smaller than this then node is considered T
complex_per_hgt: 80   # if H/(H+T) greater than this then node is considered H
complex_per_node: 90  # if node contains percent number of this category, it is assigned

# Program specific options
mafft_options: '--anysymbol --auto'
trimal_options: '-automated1'

#IQ-Tree
iqmodel: '-mset WAG,LG,JTT -AICc -mrate E,I,G,R'
ufbootstrap: 1000
iq_threads: 4

Also, I'd like to confirm what database means in config file. Is AvP expecting fasta sequences of nr as database? Also, I do not understand why there is "blastdbcmd" in the error message. It would be great if AvP can take pre-computed blast/DIAMOND database, and do similarity search with DIAMOND. In my school HPC, individual user does not have enough storage to keep nr fasta, but the managers are maintaining pre-computed nr database for blast/diamond.

Sincerely,

Cong Liu

GDKO commented 1 year ago

Hi Cong,

You should remove the brackets from the nr path nr_db_path: /apps/unit/BioinfoUgrp/DB/diamondDB/ncbi/238/nr

I have made it clearer in the config file as well

Concerning the databases, there is no need for the nr.fasta raw file, only the blast-formatted database.

Diamond can use the nr database without reformatting it (since v 2.0.8)

Unfortunately diamond cannot retrieve sequences directly from the *dmnd file. So in order to use a diamond-formatted database you need to keep both the fasta and the daa file. Furthermore, because AvP needs the taxid of each sequence you need to do some changes to the fasta file as well (see https://github.com/GDKO/AvP/wiki/Setting-up#Databases).

Hope that clears it up a bit, Georgios