drostlab / metablastr

Seamless Integration of BLAST Sequence Searches in R
https://drostlab.github.io/metablastr/
GNU General Public License v2.0
31 stars 8 forks source link

Error: Internal error in `dict_hash_with()`: Dictionary is full. #11

Open papelypluma opened 2 years ago

papelypluma commented 2 years ago

Thank you metablastr developers for sharing this tool with the community. I'd like to seek for your help for the error I've encountered following blast_best_reciprocal_hit() run. Both BLASTp seem to have completed, but the reciprocal best hit step appears to have failed. One database I'm using has around 25M records, and I'm wondering if this could be the reason why the reciprocal best hit step failed. For reference, I'm sharing the snippets of the error:

BLAST search finished! The BLAST output file was imported into the running R session. The BLAST output file has been stored at: /expt/datb/data/HiC/Rp_RNA-Seq/embryo/timeseries-vs-RpedSuzhou/rna-seq_rpedszv-reannot/annot_gene_sym/metablastr_bbh/metazoa_refseq_biopython-validated_Riptortus_pedestris_SZV_blastp-fast_eval_1e-05.blast_tbl Error: Internal error in dict_hash_with(): Dictionary is full.

rlang::last_error() <error/rlang_error> Internal error in dict_hash_with(): Dictionary is full. Backtrace:

  1. metablastr::blast_best_reciprocal_hit(...)
  2. metablastr::blast_best_hit(...)
  3. dplyr:::group_by.data.frame(blast_res, query_id)
  4. dplyr::grouped_df(groups$data, groups$group_names, .drop)
  5. dplyr:::compute_groups(data, vars, drop = drop)
  6. dplyr:::vec_split_id_order(group_vars)
  7. vctrs::vec_group_loc(x) Run rlang::last_trace() to see the full context.

Is it also possible to skip the BLAST step to directly proceed with the reciprocal best hit step when re-running this procedure?

Thank you very much!