alexpiper / taxreturn

An R package for creating taxonomic reference databases for metabarcoding studies
GNU General Public License v3.0
8 stars 1 forks source link

fetchSeqs: Invalid Multibyte string error with some BOLD entries #21

Open morien opened 3 years ago

morien commented 3 years ago

I have a list of taxonomic classes which I'm passing to the fetchSeqs function, as outlined in the vignette:

fetchSeqs(class_list, database="bold", out.dir="bold", marker="COI-5P", output = "gb-binom", compress=TRUE, force=TRUE, multithread = TRUE)

Eventually, an error like this will show up: Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, : invalid multibyte string at '<a9> 2009<20>Bryn Dentinger|<a9> 2009 Bryn Dentinger'

From what I can tell, this is a character encoding issue when R is writing output. There is a character in the BOLD entry that R's current encoding is unable to process. There are various ways to fix this but they all would involve changing the code of your function.

Thanks!

alexpiper commented 3 years ago

Its hard to diagnose where exactly error is coming up without knowing which record BOLD record has the bad characters. This package depends on https://github.com/ropensci/bold to do the actual API calls and it may actually be occurring in their codebase. Would you be able to share the specific clade or the list of clades that you are querying when this problem comes up?

morien commented 3 years ago

Trying to recreate the error with a list derived from the "Dentinger" string in the original error (that taxa list was just a list of every taxonomic class present in BOLD, so I didn't really have a good way of knowing which subquery produced the error). Right now I'm just getting this, which I'm guessing means that BOLD server is offline:

> fetchSeqs(dentinger_list, database="bold", out.dir="bold", marker="COI-5P", output = "gb-binom", compress=TRUE, force=TRUE, multithread = TRUE)
Building NCBI taxonomy data frame

|==================================================================| 100% 282 MB
Done

Error in b_GET(paste0(bbase(), "API_Public/combined"), args, ...) : 
  <!DOCTYPE html>
<html>
        <head>
                <meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<title>Server Offline | BOLDSYSTEMS</title>

<meta name="keywords" content="" />
<meta name="description" content="" />
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<link rel="apple-touch-icon" href="/favicons/apple-icon.png"/>
<link rel="apple-touch-icon" sizes="57x57" href="/favicons/apple-icon-57x57.png"/>
<link rel="apple-touch-icon" sizes="60x60" href="/favicons/apple-icon-60x60.png"/>
<link rel="apple-touch-icon" sizes="72x72" href="/favicons/apple-icon-72x72.png"/>
<link rel="apple-touch-icon" sizes="76x76" href="/favicons/apple-icon-76x76.png"/>
<link rel="apple-touch-icon" sizes="114x114" href="/favicons/apple-icon-114x114.png"/>
<link rel="apple-touch-icon" sizes="120x120" href="/favicons/apple-icon-120x120.png"/>
<link rel="apple-touch-icon" sizes="144x144" href="/favicons/apple-icon-144x144.png"/>
<link rel="ap
In addition: There were 50 or more warnings (use warnings() to see the first 50)