Open lukeoftheshire opened 2 years ago
the entire database object is returned, including the full activity specs.
I suppose you mean binary data (i.e. MP3 or PNG files for Breathe/Tips activities) and not full activity specs
as a full ActivitySpec
should NEVER be returned from an Activity
method call.
There's very limited functionality currently for an ignore_binary
flag, suggested here in #115, documented here in #150, and referenced in logs here in #431.
Currently, there are no options (or they are undocumented) that control what data is returned
However, you're correct that the ignore_binary
functionality is undocumented as it isn't meant to be part of the API right now. It's a stopgap to prevent server-side or browser-side crashes, and we do need to consider a permanent replacement, as you've correctly postulated.
An optional parameter with a name like
returned_keys
that allows a user to select what information is returned from the DB would help with this.
I'm not a fan of this approach (returned_keys
) but it is common in many REST APIs and a first-party capability of GraphQL. The original purpose of the transform
parameter was to allow API clients to do exactly this - stripping out unnecessary data or transforming it server-side, like a map-reduce
facility.
However, this doesn't resolve a major issue of the API Server actually requesting that data from the database. Requesting 25 lamp.tips
Activities, each with a 1MB image in settings
, is still 25MB transferred over the wire from the database to the API Server and loaded into memory. Then, the JSONata
facility still has to strip that manually from each of the 25 objects, which could take >500ms
.
A more performant and "correct" solution would be to automatically determine from the transform
parameter, which minimum set of keys are required and only request those from the database. Then, there is no 25MB data transfer or memory allocation, and the JSONata
facility itself will be doing minimal work which could likely take <10ms
.
Currently, there are no options (or they are undocumented) that control what data is returned from a (for example) LAMP.Activity.all_by_participant call - the entire database object is returned, including the full activity specs. This is inefficient and returns unneeded data.
For example, during analyses or viewing activities in a dashboard all that might be needed is the activity name and alphanumeric LAMP id. However, many activities contain b64-encoded images, videos, or other files, and scheduling information that is not needed normally, so transmitting it wastes time and can add up significantly if many activities are requested - for example when one researcher has many studies that each contain activities. An optional parameter with a name like
returned_keys
that allows a user to select what information is returned from the DB would help with this.