When a taxonomy dump is used that is older than the BLAST performed (or whatever was used to get the taxids), then there can often be taxids not found, leading to NAs
When this happens, I think the program should STOP (or at least provide obvious warning) and report an error like:
"Some taxids were not found in the taxonomy database, consider updating NCBI taxonomy database by running ./install -i your_metabinkit_install_directory -x taxonomy_db"
example in R
a<-data.table::fread("2019_August_002.UNIO.lenFilt.trimmed.ids.SC4.pol.blast.filt.txt",data.table = F)
b<-add.lineage.df(a,ncbiTaxDir = "/home/tutorial/TOOLS/DBS/ncbi_taxonomy/taxdump/") #an old taxonomy folder
#some stderr output
11:39:50.515 [WARN] taxid 1823760 was deleted
11:39:50.540 [WARN] taxid 1936990 was deleted
11:39:50.591 [WARN] taxid 2563896 was deleted
11:39:50.641 [WARN] taxid 2714934 not found
11:39:50.642 [WARN] taxid 2715212 not found
11:39:50.642 [WARN] taxid 2715678 not found
11:39:50.643 [WARN] taxid 2715735 not found
Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = "unknown") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = "unknown") :
invalid factor level, NA generated
In metabin
metabin -i 2019_August_002.UNIO.lenFilt.trimmed.ids.SC4.pol.blast.filt.nopaths.csv -o 2019_UNIO.metabins.new.nopath.txt -S 98 -G 95 -F 92 -A 80 --discard_sp TRUE -D /home/tutorial/TOOLS/DBS/ncbi_taxonomy/taxdump/
#some output
11:55:47.603 [WARN] taxid 2721245 not found
11:55:47.603 [WARN] taxid 2721246 not found
11:55:47.603 [WARN] taxid 2722751 not found
11:55:47.604 [WARN] taxid 2724150 not found
11:55:47.604 [WARN] taxid 2724191 not found
11:55:47.604 [WARN] taxid 2724192 not found
#but program completes
When a taxonomy dump is used that is older than the BLAST performed (or whatever was used to get the taxids), then there can often be taxids not found, leading to NAs
When this happens, I think the program should STOP (or at least provide obvious warning) and report an error like:
"Some taxids were not found in the taxonomy database, consider updating NCBI taxonomy database by running ./install -i your_metabinkit_install_directory -x taxonomy_db"
example in R
In metabin