Open bwprice opened 2 weeks ago
Hi Ben,
The search API is currently limited to public records. Searching for private records using processIDs, sampleIDs, project codes, and dataset codes is supported. A full search API will soon be available for private records.
Thanks for the quick reply Sujeevan. I don't think it's a public / private record issue.
Take this species: Selenocephalus pallidus, there is a public record via BGE without sequence data, I can see it when not logged into BOLD: https://v4.boldsystems.org/index.php/Public_RecordView?processid=BGENL2002-24
But, it doesn't show up in v5 portal, either using taxonomy:
or the process ID:
or using the R package:
> bold.data <- bold.public.search(taxonomy = "Selenocephalus pallidus")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
I can get the record using bold.fetch and my API key,
> res <- bold.fetch(get_by = "processid",
+ identifiers = "BGENL2002-24")
Initiating download
Downloading data in a single batch
Download complete & BCDM dataframe generated
....so it is behaving as if it were a private record, but it is public...
Sorry if I've misunderstood, excited to get working with this package :)
Yes, you’re correct. That particular record is not available in the BOLD5 data portal. The BOLD5 data portal does not index records without sequences. This is a key difference between the BOLD4 and BOLD5 portals. BOLD needs to focus on barcode records. In the past, we deviated from this focus by circulating records without sequences, but I found that these records tended to create more noise than value.
While this is a shift in direction, I believe it is an important step in clarifying BOLD’s role. You can still access records without sequences in the private database using dataset codes and project codes. When the search API for private records opens up, you will be able to perform searches in that manner.
ok thanks for the clarification :)
Hi Sujeevan, this makes a lot of sense from the perspective of sample tracking within a project (i.e. with the right code(s) and private API access, you could check how your processes are doing), but it raises the issue of how gap lists could be managed. Basically, I have a species list, and I'd like to know if anyone is in the process of sampling them. How would we as a community do that?
Hi! Currently bold.public.search only retrieves process IDs for records with sequences. Can this be updated to include all public records? Its very useful to know if specimens are registered in BOLD but not sequenced yet.
For example: bold.data <- bold.public.search(taxonomy = "Baetis rhodani") retrieves 834 process IDs
This result matches what's showing on BOLD v5: https://portal.boldsystems.org/result?query=%22Baetis%20rhodani%22[tax]
However BOLD v4 has 885 records, with 834 sequenced: https://v4.boldsystems.org/index.php/Taxbrowser_Taxonpage?taxon=Baetis+rhodani&searchTax=Search+Taxonomy
Thanks! Ben