PoisotLab / GBIF.jl

Functions and types to access GBIF data from Julia
https://ecojulia.github.io/GBIF.jl/latest/
Other
19 stars 4 forks source link

Improve query speed #36

Closed tpoisot closed 4 years ago

tpoisot commented 4 years ago

There are a few changes in this one

  1. Everything related to filtering has been removed
  2. Consequently, a lot of the fields from GBIFRecords have been removed as well
  3. The management of occurrences retrieval and offset/limit updates is done in specific functions
  4. GBIFRecords are pre-allocated when creating, so retrieving many occurrences does not decrease performance
  5. The documentation has a bit more precise information about how to use with DataFrames and Query

Other information

codecov[bot] commented 4 years ago

Codecov Report

Merging #36 into master will increase coverage by 4.27%. The diff coverage is 89.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
+ Coverage   79.86%   84.13%   +4.27%     
==========================================
  Files          11       10       -1     
  Lines         149      145       -4     
==========================================
+ Hits          119      122       +3     
+ Misses         30       23       -7     
Flag Coverage Δ
#unittests 84.13% <89.18%> (+4.27%) :arrow_up:
Impacted Files Coverage Δ
src/GBIF.jl 66.66% <ø> (ø)
src/types/GBIFRecords.jl 90.90% <ø> (+4.54%) :arrow_up:
src/types/show.jl 0.00% <0.00%> (ø)
src/occurrence.jl 89.18% <85.71%> (-1.14%) :arrow_down:
src/paging.jl 86.36% <94.73%> (+11.36%) :arrow_up:
src/types/iterators.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update db992e9...7dc3903. Read the comment docs.

tpoisot commented 4 years ago

@mkborregaard just to give you an overview: (1) there are no functions related to filtering anymore since everything they did can be done better using Query, and (2) the records are pre-allocated on the first query to avoid appending to the array, which made the process of getting more than 20k occurrences slow down dramatically