Maine-eDNA / mednaTaxaRef

Development of an R toolset to generate reference databases for use in Maine-eDNA sequence analyses based on merging existing functionalities
2 stars 2 forks source link

Using resetz to fetch and build a refdb by domain #13

Open btupper opened 6 months ago

btupper commented 6 months ago

Catch-up notes... restez makes for easy (not faster) downloading and querying of NCBI databases. The author responded quickly to a pull request and the package seems actively maintained.

Fetching

It boils down to one command that takes a long time. It handles multiple tries (in case of network blip) and couldn't be easier to use.

  db_path = file.path(restez_root, db)
  if (!dir.exists(db_path)) dir.create(db_path, recursive = TRUE)
  restez::restez_path_set(db_path)
  ok = try(restez::db_download(preselection = dbs[[db]], 
                               overwrite = overwrite, 
                               max_tries = max_tries))
  if (inherits(ok, "try_error")){
    charlier::error("unable to fetch the data")
    cat(ok, sep = "\n", file = logfile, append = TRUE)
  } else if (!ok){
    charlier::warning("unable to fetch the data")
  }
}

Building

Once the data are downloaded, then the database must be built. This actually requires a bit more effort to make sure that previous databases are intentionally deleted. This is a good defensive practice. We set the minimum length to 1 (must have one base pair) and the maximum to 2000.

restez_root = file.path(cfg$rootpath, cfg$name)
if (!dir.exists(restez_root)) ok = dir.create(restez_root, recursive = TRUE)
restez::restez_path_set(restez_root)
restez::db_delete()
restez::db_create(
  db_type = cfg$create$db_type,
  min_length = cfg$create$min_length,
  max_length = cfg$create$max_length)