Closed steffilazerte closed 5 years ago
Those sound fine. In theory, some of these combinations could be valid, but in the optic of keeping queries simpler and more responsive, it’s a reasonable tradeoff.
Some warnings at least would be good if some parameters are being ignored. E.g. Statprov code provided, ignoring country code.
The instructions in the book should highlight that type of approach, indicating that the API is a coarse filter, and the they can apply finer grain filtering once they have the data. For instance, a regular request we have is for people wanting to do a data extraction based on a shapefile. We could tell them how to extract the bounding box of the local shape in R, send that as min/max Lat/Lon, and then do the overlay locally in R.
Thinking about this a bit more, and as I have hinted previously, I would also want to limit the number of filter concepts they are applying to no more than about 3 at the same time (I would count the 4 parameters of the bounding box as 1 concept for that purpose), and I think they should generally not be able to mix more than 1 concept of the same type.
It’s like a Chinese restaurant. They can pick up to any 3 among that list, and no more than 1 per category, or something like that:
· Species ID
· Geographic region (country, statprov, subnat2, bcr, iba, utm square, bounding box)
· Collection (collection code or project)
· Year (start/end year)
· Day of year (start/end doy). use that terminology vs. Julian date which starts at 0 and allows for fractions
· Site type (see below)
Am I forgetting any parameter?
To help the user, we could still do the sort of cleaning you suggested above (e.g. ignore country with statprov), and of the remaining list of parameters, run a validation against the rule for 1 per category / 3 category max. I know it is yet one more thing to force people to understand, but I also know that too many options will lead to time outs in many cases.
A couple possible parameters I would like to add:
1) Site Type (for now, only supporting the value IBA, to identify whether the site falls within an IBA or not). This may be a standalone parameter that can be combined with other geographic things except IBA site, since we may want to allow say all IBA data within a BCR or province. I think that sites outside IBA’s are saved as NULL, but I will confirm as I also see “N/A” strings. The SQL filter would then be something like “iba_site IS NOT NULL”
2) I will look into whether we can or should add a family parameter (e.g. all Anatidae). This would have to rely on an external table so I don’t know yet if that is of sufficient interest or feasible.
I think the “site type” is relatively simple and useful, so I’ll probably want to add that one.
Let me know Steffi if you have any questions.
Okay thanks for the clarification!
I have just a couple of confirmations, comments, and questions :)
1) Any time I change a user request, I'll make sure a message prints to that effect
2) I think I'll revamp the function arguments to highlight these categories a bit more explicitly, that should make it easier for users to understand.
3) I've been using 'start_season' and 'end_season' rather than Julian day or day of year, to be explicit that this isn't bounding dates overall, but rather it bounds dates within a year. I'm happy to change it back to day of year, though, if you prefer.
4) One other set of parameters are the bmdeVersion
and fields
, do they count? Or are they separate?
5) I don't think having a family
parameter is necessary, as long as an extensive species list is okay (something Paul and I have been discussion #11), I have added (locally, not pushed yet) an example showing users how to grab all species ids from a specific family and use that for the download.
6) I have added the example of filtering observations to a bounding box to the articles wishlist
Fields and bmdeVersion shouldn’t be counted in the same list, since they are not filtering parameters affecting rows.
Yes, I think I would use day_of_year. Season is a bit more nebulous.
The bounding box example would also include the next step of filtering the local data against the shapefile, ideally.
Sounds good!
I plan to start adding server-side validity checks to filter attributes this afternoon. My intention is to stick to the obvious 'out-of-range' checks to start with, and leave the more subtle stuff to the R-client (especially since you have gotten ahead of me on this!)
So for example, I will sanity check the following:
I already have some error traps for invalid bmdeVersion.
I'm just in the process of double checking that I have all the filters in place with validity and redundancy checks.
Two filters that I'm a bit confused about are Site Type (siteType
?) and Site Code (siteCode
).
Site Type was discussed above, but isn't present in my index.html API cheat sheet. I assume the filter is siteType
and that it takes an argument of either NULL
or IBA
, but could someone confirm that? Also, in the data, it is definitely filled with N/A
strings, which might interfere with the function.
siteCode
is in my index.html API cheat sheet, but wasn't discussed above, should it be considered a Geographic regional filter (i.e. fits in with country
, statprov
, subnat2
, bcr
, iba
, utm square, and bounding box)? Or are we dropping it?
Incidentally, SiteCode
is the one field in the downloaded data that looks like it should be a basic field (i.e. should be in snake_case) but is in fact in CamelCase.
SiteCode. Yeah. ¯_(ツ)_/¯
I’m a bit hesitant about having fields with different names in the table and the API. An exception here would have something that Paul implements, since he is reading the table directly, not a view.
Site type: yes, the intent was to have no value (no filtering) or “IBA”. This would allow to get all data within any IBA in that case. I suspect Paul hasn’t implemented that option yet.
Site Type has not been implemented, correct. I don't recall seeing it in the initial spec, but I might have missed it. In any case, I will go ahead, yes?
Probably add a function to DataRequests object, to handle this.
API filter attribute will be siteType
If you build to bsc-base and deploy to sand box this can now be tested. (also, fixes re handling start and end days)
OK deployed.
I'm adding in validity and redundancy checks (i.e not passing both
country
andstatprov
becausestatprov
is sufficient). Validity checks are pretty straightforward (codes must be in the metadata lists, years must be between 1900 and present, julian dates within 1:366).But when it comes to redundancy I thought I should perhaps check in with you two, in case there's something I've missed. So far I have these rules:
country
>statprov
>subnational2
are redundant, keep only smallestminLat
,maxLat
,minLong
,maxLong
are not redundant with the above. I.e., they can mix withcountry
/statprov
/subnational2
because users might only supply one of the four options (justminLat
for example).utmSquare
is redundant with all the above (so ifutmSquare
is supplied,minLat
,maxLat
,minLong
,maxLong
,country
,statprov
andsubnational2
are all ignored)siteCode
,IBA
andBCR
are all redundant with all of the above (unless users might want to select, for example, a minimum latitude in a particularIBA
?)Any changes or additions?