envmetagen / metabinkit

Set of programs to perform taxonomic binning.
GNU General Public License v3.0
2 stars 1 forks source link

provide error when taxids not found #14

Closed bastianegeter closed 4 years ago

bastianegeter commented 4 years ago

When a taxonomy dump is used that is older than the BLAST performed (or whatever was used to get the taxids), then there can often be taxids not found, leading to NAs

When this happens, I think the program should STOP (or at least provide obvious warning) and report an error like:

"Some taxids were not found in the taxonomy database, consider updating NCBI taxonomy database by running ./install -i your_metabinkit_install_directory -x taxonomy_db"

example in R

a<-data.table::fread("2019_August_002.UNIO.lenFilt.trimmed.ids.SC4.pol.blast.filt.txt",data.table = F)
b<-add.lineage.df(a,ncbiTaxDir = "/home/tutorial/TOOLS/DBS/ncbi_taxonomy/taxdump/") #an old taxonomy folder

#some stderr output
11:39:50.515 [WARN] taxid 1823760 was deleted
11:39:50.540 [WARN] taxid 1936990 was deleted
11:39:50.591 [WARN] taxid 2563896 was deleted
11:39:50.641 [WARN] taxid 2714934 not found
11:39:50.642 [WARN] taxid 2715212 not found
11:39:50.642 [WARN] taxid 2715678 not found
11:39:50.643 [WARN] taxid 2715735 not found

Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = "unknown") :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = "unknown") :
  invalid factor level, NA generated

In metabin

metabin -i 2019_August_002.UNIO.lenFilt.trimmed.ids.SC4.pol.blast.filt.nopaths.csv -o 2019_UNIO.metabins.new.nopath.txt -S 98 -G 95 -F 92 -A 80 --discard_sp TRUE -D /home/tutorial/TOOLS/DBS/ncbi_taxonomy/taxdump/

#some output

11:55:47.603 [WARN] taxid 2721245 not found
11:55:47.603 [WARN] taxid 2721246 not found
11:55:47.603 [WARN] taxid 2722751 not found
11:55:47.604 [WARN] taxid 2724150 not found
11:55:47.604 [WARN] taxid 2724191 not found
11:55:47.604 [WARN] taxid 2724192 not found

#but program completes