BirdsCanada / NatureCountsAPI

NatureCountsAPI
0 stars 1 forks source link

creating a minimum set of fields as default #3

Open denislepage opened 5 years ago

denislepage commented 5 years ago

I created a new set of fields called "BMDE2.00-min" which is a very limited list (min = minimum) that should be used by default when the bmde version is not specified. As requested in issue #2, the 28 indexed fields should be included in all returned results, so the minimum set will only provide a few extra fields (start time, count of individuals, etc.). I have removed any/most of the fields that could be derived from the indexed fields (e.g. all of the taxonomy can be taken from species_id in the lookup tables). The list may still need a few tweaks.

In order to get the list assigned the a specific collection, the API call should expect something like version: "default". The possible values will therefore be:

NULL: BMDE2.00-min default: BMDE2.00 BMDE2.00-ext etc.

The user should also be able to specify a custom list of fields, which we should validate (including looking for valid names, case-sensitivity and duplicate entries).

pmorrill commented 5 years ago

From Denis:

I added a shorthand column in lk_bmde_versions and linked them when it made sense. It will make the version names that people are expected to use most often more stable over time. E.g. version = "minimum" would now refer to BMDE2.00, but may later be equivalent to BMDE2.01, which may have minor field additions/removal.

I thus think that the values for the version parameter can thus be:

Most people would ever only use either minimum or default, unless they knew what they were doing.

denislepage commented 5 years ago

Responding to Paul's email here.

First, to confirm, the version should be required by the API (as opposed to defaulting it to one of the values), and it should be validated against the possible values I indicated (shorthand or full name). An error should be returned if the value is not recognized. R could use minimum as the default.

One alternative option for the vector of fields is to treat it as a supplement to the version you asked for. Say you ask for "minimum" and provide a vector with "FieldName1", you get all the fields from minimum + FieldName1.

The "custom" list in essence is an empty base list (again, indexed fields are always provided in all cases). If you pass an empty vector with the custom type, you only get the indexed field.

If you both think it's simpler for the user to only allow a list of fields associated with the custom type, we can also go with that.

If the fields requested do not exist, the API would generate a warning, and the R code could return an error. If the list of fields include values that are already in the base, this fails silently on the API side. Error (or warning?) on the R side. I am not sure if we should make the field names case-sensitive. Perhaps the R code can match the provided names to the case-sensitive names and have the API be case-sensitive.

pmorrill commented 5 years ago

I am not sure we need to support a custom fields list when the user is specifying a 'version' attribute other than 'custom'. It can be done, but where is the utility? The version shorthands serve their purpose as they are.

It would complicate interpretation of the 'fields' parameter. As originally defined, it was to be a SUBSET of the fields listed in the 'default' collection version. In other words, I am discarding requested fields if they are not part of the defined default for the collection, and returning only the custom list of fields that results (plus indexed fields).

But your description above has me adding custom fields to a defined list in some cases. It's getting confusing.

I'd say that we support the 'fields' list only if 'version' is set to 'custom', and use it to subset the default list of fields. At least for the time being.

steffilazerte commented 5 years ago

So, if I understand, we have the following:

This makes sense to me and will be easy to implement on my end.

One question, though: Is fields in addition to the minimum set? Or really only what the user specifies? (including record_id because I have to have that one) e.g., could the user specify CollectionYear and really just get that column?

pmorrill commented 5 years ago

Actually, as it now works, 'fields' is not in addition, but a subset of the 'default' field set. The default for your collection can be parsed out of the api metadata/collections query.

So, as long as CollectionYear is part of the 'default' for your collection, then you will get back ONLY CollectionYear plus the indexed fields (which includes record_id).

denislepage commented 5 years ago

The indexed fields (~20 snake_case names) will always be included with all requests. This includes record_id, lat/lon, date, species_id, etc.

The “version” only applies to BMDE fields.

The minimum set does not include any fields that are overlapping with indexed fields (e.g. coordinates), or things that can be derived from lookup tables (e.g. species taxonomy). CollectionYear would not be included, because there’s already survey_year in the indexed fields. It will be included in all other pre-defined standards (default, core, etc.)

You missed that there’s also an option for “extended”, which is all fields available in BMDE (very extensive!). I am on the fence whether we should even offer that.

The behavior of “fields” still wasn’t entirely settled. The initial proposal was to only allow fields to be used with the custom version type, and we probably should keep it that way. My proposal yesterday was rather to make that a list of supplemental fields added to the version, so users could make minor additions to the list of fields a set provides. Paul has indicated that he finds that more confusing. Let’s keep the list of fields tied to the custom version.

One way to address what I was trying to do is handle this on the R side. If there’s a function to load the list of fields from a given set in R (from the local data), the user can then make changes to create their own custom set from that.

Something like this:

fields = get_bmde_fields(“minimum”) fields = c(fields, c(“YearCollected”, “MonthCollected”, DayCollected”)

steffilazerte commented 5 years ago

Right I think I've got it now.

That's definitely something we can implement in R, sounds like something we could add to an article, rather than actual functionality, right?

denislepage commented 5 years ago

We probably don't need a function to modify the list of fields. A function should be there to read the list from the standards, and there should be validation done on the list before passing it on to the API.