denalitherapeutics / archs4

An R interface to query and extract data from the ARCHS4 data
10 stars 1 forks source link

Testtdrive #6

Closed tomsing1 closed 6 years ago

tomsing1 commented 6 years ago

A few small bug fixes and some additional helper functions to retrieve information from the NCBI:

  1. lookup_biosamples leverages the rentrez package to retrieve sample annotations from the NCBI Biosample database. It's fast, but in its current implementation adds a number of dependencies, e.g. rentrez and xml2, as well as a few tidyverse packages. Some of the latter could be trimmed (e.g. readr).

  2. lookup_gse and its unexported backend query_geo directly access NCBI GEO's REST API to retrieve Series-level information, e.g. the sample identifiers associated with a series. (These could then be passed on to lookup_biosamples.) GEO's REST API (and hencequery_geo) can also retrieve sample-level information, but processes only one sample at a time. That's why I chose the rentrez path for lookup_biosamples instead.

I haven't updated the DESCRIPTION or NAMESPACE files, nor written or run any tests. Take a look and then decide if these functions are useful and should find a place in this package. Alternatively, we could keep them in a separate package. (I have a few more tricks like this up my sleeve for querying EBI's SRA database that could find their place in a separate package.)

Best, Thomas