PoisotLab / GBIF.jl

Functions and types to access GBIF data from Julia
https://ecojulia.github.io/GBIF.jl/latest/
Other
19 stars 4 forks source link

add Base `read!` method to read all occurances to the array #45

Closed rafaqz closed 3 years ago

rafaqz commented 3 years ago

Seems like it should be easier to just get all of the records? This is a very basic implementation. But read! seems like the most appropriate method to use.

From the docs for read!:

Read binary data from an I/O stream or file, filling in array.

It would be better if it was easy to change the "limit" keyword before doing this, setting it to 300. See #47

Example:

# Read the first occurrences
occ = occurrences("scienticName" => "Burramys parvus");
# get the rest
read!(occ)
codecov[bot] commented 3 years ago

Codecov Report

Merging #45 (85e5476) into main (5385cc9) will decrease coverage by 3.23%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
- Coverage   85.95%   82.72%   -3.24%     
==========================================
  Files          10       10              
  Lines         178      191      +13     
==========================================
+ Hits          153      158       +5     
- Misses         25       33       +8     
Flag Coverage Δ
unittests 82.72% <100.00%> (-3.24%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/types/iterators.jl 100.00% <100.00%> (ø)
src/types/show.jl 0.00% <0.00%> (ø)
src/GBIF.jl 71.42% <0.00%> (+11.42%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5385cc9...85e5476. Read the comment docs.

rafaqz commented 3 years ago

Ok. I'll add a test somewhere too.

tpoisot commented 3 years ago

please update doc and use cases to see how it changes the code - also please add the code as it would currently be written in your issue

rafaqz commented 3 years ago

You mean read!(occ) ??? This is a very simple PR. The question is if you want it, before I add documenation.

tpoisot commented 3 years ago

As I said in #47, I don't think it's a good idea. It used to be in the package, and the fact that some requests are very long ended up being not user-friendly.

rafaqz commented 3 years ago

This may not be the perfect approach, but downloading more than 300 records with a single function call is a reasonable thing to want. The R GBIF package lets you download up to 100,000 records with a single call:

https://docs.ropensci.org/rgbif/articles/rgbif.html#search-for-occurrences

Having that automated is something we can have here too.

GBIF also has a batch queue mode for occ_download for handling even larger requests: https://github.com/ropensci/rgbif/blob/master/R/download_queue.R#L37

Tim, to clarify, all of these PRs came directly from using GBIF data in R and trying to replicate the workflow in Julia, finding some things more difficult. This ease will be what people moving to Julia will be comparing to. I'm just looking at how we can match what they will be expecting.

rafaqz commented 3 years ago

Also note that the implementation of read! is identical to the while loop in the docs: http://docs.ecojulia.org/GBIF.jl/latest/examples/cardinal/

Using the while loop pedagologically leads us to #44 - size and length differ from the standard semantics in Julia, and we will have to explain that. read! hides that detail.

I was hoping to make a method we can use in other package documentation where GBIF provides point data for e.g. the extract method, and in other teaching examples.

tpoisot commented 3 years ago

If you can make a concrete proposal for #44 as an issue (not a PR), and when we get an agreement on that, we'll re-visit this PR. Closing for now.