Open SimonGoring opened 8 years ago
I thought of better examples. . . The example above seems trivial :)
Simon, I have a couple of reasons for returning everything as a flat table.
1) To decouple the response from the response format. You can get any response you like as either CSV or JSON, and you get exactly the same data from each.
2) To simplify the server code. It makes things SO MUCH EASIER if the server just has to generate a list of records. Trying to format things into a complicated JSON structure makes the code more complicated and slows everything down.
That said, there is no reason why we couldn't use bibJSON and format the records in a more natural way. That actually wouldn't complicate things much because each record is still a separate JSON string. In fact, that is a very good idea. We could add bibJSON as a vocabulary option when returning publications.
I am a lot more skeptical about, for example, listing sites and having a sub-list of occurrences under them. That is a good example of something that would complicate the server code. I would much rather implement this as two separate calls: one to list the sites, and one to list the occurrences, with the latter including a siteID field so that you can match up which occurrence goes to which site.
Hi @mmcclenn & @jpjenk I just want to clarify the discussion we had about flat data structures in the API response.
Right now, regardless of data format (
json
,xml
,csv
), we are returning data as a flat table.I understand the motivation for doing this for
csv
formats, but the JSON and XML formats are designed to return structured data, so I'm not clear why we wouldn't use this in that case.For example, the bibJSON schema for publications is designed to support (for example) variable length author lists, or sets of publications with differing reference structures.
Given the extent of repetition and the potentially large size of some of our responses it might make sense to consider structured data formats for some of the responses, particularly since we're making our users define the response type they're expecting.
For example, a
publication
response inJSON
would use the bibJSON standard, while in CSV is would be wide table that could be saved ascsv
.My thinking is two-fold:
versus:
saves us an astounding 24 bytes per row :) Which isn't that much, I suppose, but then we could add a bit more structure, returning a taxon table for multi-taxon responses that would link the taxon IDs to the names, so we wouldn't need to repeat those as well. I think we'd see performance improvements in the downstream applications that use the application, particularly web based services that use JSON natively.
Tagging @spatialit as well.