gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Data retrieval help - occurrences for WDPA dataset #63

Closed rukayaj closed 2 years ago

rukayaj commented 2 years ago

Marc at NTNU has asked for some advice with an R script to query occurrences intersecting the WDPA dataset https://www.protectedplanet.net/en/thematic-areas/wdpa?tab=WDPA . We got the query for occurrences working, and now he has to link it up with taxonomy. For this next step, I have suggested he just makes a list of taxonKeys and looks them up one by one.

Just making an issue so I can find it here in case he asks about it again.

For a start you need to do the occ_search and NOT immediately access $data. Then inside the result of occ_search should be something called 'count'. That will be the total number of records you can get back, you use that number to do some loop/paging.

So you will do your first call to the api like results = occ_search(taxonkey=123, more stuff here, limit=1); // It doesn't matter what limit is, but 1 will give you a fast result. You are ONLY using this to get the count. count = results$count

Then look at this to see how to do for loops: https://www.r-bloggers.com/2015/12/how-to-write-the-first-for-loop-in-r/

So based on that you can do something like all_results = c(); // This is empty and will get filled up with each step through the loop, you could also make it a data frame and just append rows to it directly each time you loop over

step_increment = 100; // or whatever number you want

for (i in 0:count/step_increment) { // e.g. if count is 500, then i first time round will be 0, then 1, 2, 3, 4 loop = occ_search(taxonKey=123, more stuff here, limit=step_increment, offset=i*step_increment)$data; all_results = c(all_results, loop); // I can't remember how you do this in R, but you just add the current loop's results onto the main list }

The first loop will give you 0 - 100, the second 100 - 200, third one will be 200 - 300, etc. Then you can access all the loop results in all_results and put them together however you want.

rukayaj commented 2 years ago

Note: You cannot retrieve more than 100 000 occs from the API, so he has to do an occurrence download via the API instead. Also there's a problem with certain characters in scientificnames, possibly a ' character. I've suggested he just escape all of them for the moment.

rukayaj commented 2 years ago

I haven't heard from Marc for a while so I think I will close this.