ateucher / rebird

Wrapper to the eBird API.
Other
1 stars 0 forks source link

Add location checking to get_freq #2

Closed ateucher closed 9 years ago

ateucher commented 9 years ago

API reference: https://confluence.cornell.edu/display/CLOISAPI/eBird-1.1-LocationReference. Make sure we use ebird_GET() (https://github.com/ateucher/rebird/blob/master/R/zzz.R)

ateucher commented 9 years ago

Argh, except that only returns csv or xml

sebpardo commented 9 years ago

What's the difference between GET() and ebird_GET()?

ateucher commented 9 years ago

ebird_GET wraps GET up with all of the common operations for getting stuff from the ebird API - makes it so there isn't so much code repeated in all of the functions that get data from the API.

sebpardo commented 9 years ago

So given that ebird_GET is meant for dealing with eBird's API, then there's no need to use it in get_freq?

ateucher commented 9 years ago

I thought we could use to query the locations API endpoint to check locations

On Thu, Apr 23, 2015 at 1:45 PM, Sebastian Pardo notifications@github.com wrote:

So given that ebird_GET is meant for dealing with eBird's API, then there's no need to use it in get_freq?

— Reply to this email directly or view it on GitHub https://github.com/ateucher/rebird/issues/2#issuecomment-95714178.

ateucher commented 9 years ago

We should write a new function that queries the locations endpoint of the eBird API, that can be called by get_freq to check the loc parameter

sebpardo commented 9 years ago

I made a function that checks the API for matching locations, and it seems to be working solidly: https://github.com/ateucher/rebird/blob/frequency-data/R/eloc_check.R

I didn't use ebird_GET as it relies on json, however LocationReference does not permit that format for some reason. Using GET instead.

Still need to figure out a way to check hotspots, but I'm not sure if it's possible. I just checked what happens when you enter an invalid hotspot in rebird's ebirdhotspot and it just returns an empty data frame, which is what get_freq does as well.

ateucher commented 9 years ago

That looks awesome - nice work! I made a few tweaks, hope they're ok with you

ateucher commented 9 years ago

Regarding checking hotspots, what if we make the function throw a warning telling the user to double-check the hotspot code if nrow(freq) == 0 && loctype == 'hotspots'?

sebpardo commented 9 years ago

I'm all for tweaking as I get to learn how to write better code by seeing your commits.

ateucher commented 9 years ago

:+1:

sebpardo commented 9 years ago

Done warning if no observations returned when querying hotspot (7463d45ea5597b447b1434539ca4a9f8065a0a71). nrow(freq) == 1 instead of 0 as there is it still returns a row for Sample Size, but full of NAs.

ateucher commented 9 years ago

In that case instead of nrow(freq) == 1, what about all(is.na(freq)). There could be a case where there is one row of valid data returned

sebpardo commented 9 years ago

I can change it to that. If there is one row of valid data then nrow = 2. However, I can think of a case where this would fail: if a hotspot exists but there are no checklists in it, which could be tested by specifying a historical year span. I think this would return a data frame with one row full of zeros. Going to play with it now and change it to all(is.na(freq)).

sebpardo commented 9 years ago

So even if querying a hotspot for a period with no checklists, you still get a row almost full of NAs (freq[1,1] = "Sample Size:")

For example, a hotspot for a period with a single observation you get two rows:

h1 <- get_freq("hotspots", "L196159", 1969, 1969, long = FALSE)
head(h1)[, 30:38]
>  Aug-1 Aug-2 Aug-3 Aug-4 Sep-1 Sep-2 Sep-3 Sep-4 Oct-1
>1     0     0     0     0     1     0     0     0     0
>2     0     0     0     0     1     0     0     0     0

If you query the next year, nrow(freq) = 1 but it is still just NAs, even when the hotspot does exist:

h2 <- get_freq("hotspots", "L196159", 1970, 1970, long = FALSE)
head(h2)[, 30:38]
>  Aug-1 Aug-2 Aug-3 Aug-4 Sep-1 Sep-2 Sep-3 Sep-4 Oct-1
>1    NA    NA    NA    NA    NA    NA    NA    NA    NA

We could use all(is.na(freq[,-1])), however I can't imagine there being a case where we wouldn't get the same answer this way or using nrow(freq). I do agree that the correct check is for NAs instead of counting the number of rows, so I'm going to change it now.

sebpardo commented 9 years ago

Done (e68700f5634289b1116d0ec2bdef709d8d2d9764).