cparsania / phyloR

An R package to prepare data for phylogenetic analysis
https://cparsania.github.io/phyloR/
Other
4 stars 1 forks source link

add_taxonomy_columns() function only outputs the first 10 lines #1

Open yamkela-mg opened 7 months ago

yamkela-mg commented 7 months ago

Hi there,

I am add NCBI taxonomy classifications to my DIAMOND output file. I ran PhyloR as follows:

library (phyloR) library (readr) library (taxize) setwd("/home/ymgwatyu/lustre/000_GenomeData/01_MinION/phylor") data <- read_tsv("/home/ymgwatyu/lustre/000_GenomeData/01_MinION/phylor/diamond_data.txt", show_col_types = FALSE)

add_taxonomy_columns(data, ncbi_accession_colname = "ncbi_accession", ncbi_acc_key = "98845081e276ecedd2e2b92d339fb7354108", taxonomy_level = "family", map_superkindom = FALSE, batch_size = 20)

The output file looks like this : ?^?^? Done. Time taken 6.39 ?^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^$ ?^?? Rank search begins... ?^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^?^??^$ ?^?^? Done. Time taken 0.95

A tibble: 6,079 ?^? 4

Gene ncbi_accession taxid family

1 g2420.t1 XP_019440838.1 3871 Fabaceae 2 g20534.t1 XP_057737287.1 217475 Fabaceae 3 g37802.t1 XP_031279371.1 55513 Anacardiaceae 4 g13363.t1 QHN77035.1 3818 Fabaceae 5 g30858.t1 KAE9615640.1 3870 Fabaceae 6 g24702.t1 OIW14831.1 3871 Fabaceae 7 g17954.t1 KAE9590247.1 3870 Fabaceae 8 g20072.t1 XP_019420191.1 3871 Fabaceae 9 g12935.t1 WAX01758.1 649199 Fabaceae 10 g914.t1 XP_019444688.1 3871 Fabaceae # ?^Ĺ 6,069 more rows So it only annotated the first 10 accessions. How do I get it to process more than 10? or to print out more than 10 lines in the output file?
cparsania commented 7 months ago

Hi, Cannot read some of your text. Can you please update the output in readable format ? If possible upload the query ids as well.

Chirag.

yamkela-mg commented 7 months ago

add_tax_final_outfile.txt

I managed to get it to print more than 10 lines in the output file by including the sink() function on my r script.

Another question, what do the NAs on my output file mean? I got a lot of them and when I manually checked some of those accessions they do exist on NCBI protein database

cparsania commented 7 months ago

Internally It does taxonomy search using R packages taxizedb and taxize. Make sure that these packages have latest taxonomy databases downloaded in form of SQL files.