NeotomaDB / api_nodetest

node.js and express implementation for the Neotoma API
MIT License
2 stars 0 forks source link

What is the occid?! #54

Closed jpjenk closed 6 years ago

jpjenk commented 6 years ago

This identifier, which should define an occurrence of a taxa in time and space, is an input to the occs endpoint but not generated by anything (Sites, Datasets, References or Taxa endpoints).

All endpoints should be intrinsically linked: taxa have occurrences, datasets are made up of occurrences (sites are temporal indiscrete amalgams of datasets) and all are linked by published literature references.

It was my understanding that this conceptual failure was the principle reason for attempting an ambitious rewrite of the Neotoma API.

SimonGoring commented 6 years ago

The occid is intended to be the combination of the sample id and the preface of the db name (e.g., sample # 13224 is neotoma:13224.

None of the current endpoints return occid or sample ID because I haven't written anything that returns counts yet. That is almost next on the list.

jpjenk commented 6 years ago

Excellent, we're on the same page here. I was just checking to see if the intent was to have occid built into the other endpoints, sounds like it is so again, great!

Unless there is an internal reason, it would be preferred if Neotoma can return a simple integer for occid (and other identifiers for that matter). The reason for this is that ELC will already clarify the object ID number by appending both database and datatype in the form of database:datatype:id_number. I have written a parser that will take a list of assorted IDs and separate them by both database and datatype for routing to the correct endpoint and database subquery. Along these lines, the Neotoma API should be able to accept only integer lists for the allowable identifier. Example:

ELC Locale endpoint parameter: loc_id=neotoma:occ:123,pbdb:occ:456,neotoma:occ:789 Will route to a Neotoma dataset endpoint call with parameter: siteid=123,789 And to a PBDB collections endpoint call with parameter: coll_id=456 The ELC response once these subqueries have been processed will include something like: {'locale_id': neotoma:dst:#, neotoma:dst:#, pbdb:col:#, pbdb:col:# ... etc.

I have written an output filter for the ELC JSON response that is controlled with the 'show' parameter such that:

show=full --> returns metadata/summary block together with all record data show=poll --> returns only the metadata and summary show=idx --> returns only a list of the identifiers returned by that endpoint in the colon separated format described above. This list could programmatically be sent into another endpoint

This response control (and the show parameter) is available across all endpoints.

jpjenk commented 6 years ago

It is also worth noting that returning identifiers with hardcoded dataset tags from the subqueries is not necessary because I have rewritten ELC to be completely database agnostic. All database specific settings and controls are in a configuration file (yaml format) and the custom handlers (basically field name and object mapping) for each database are broken out into separate file of about 10-15 lines of code for each endpoint of each database. This means that any similar type database that provides a REST API could be added to ELC in less than a day.

SimonGoring commented 6 years ago

Yep, the values returned from neotoma are just integers. They are currently called the sampleid.

SimonGoring commented 6 years ago

I'm going to leave the terms as sampleid since this is the neotoma API, and as such should return "Neotoma" specific terms.

jpjenk commented 6 years ago

Completely agree, terms should be returned using language specific to providing database. It is up to the ELC API to present a common vocabulary which spans the databases. The original question was whether sampleid could be interpreted as an occurrence ID since this is the record level identifier returned by that end point. It appears so as evidenced by the discussion above.