BIDMCDigitalPsychiatry / LAMP-platform

The LAMP Platform (issues and documentation).
https://docs.lamp.digital/
Other
13 stars 10 forks source link

LAMP.Activity API - selectable keys #496

Open lukeoftheshire opened 2 years ago

lukeoftheshire commented 2 years ago

Currently, there are no options (or they are undocumented) that control what data is returned from a (for example) LAMP.Activity.all_by_participant call - the entire database object is returned, including the full activity specs. This is inefficient and returns unneeded data.

For example, during analyses or viewing activities in a dashboard all that might be needed is the activity name and alphanumeric LAMP id. However, many activities contain b64-encoded images, videos, or other files, and scheduling information that is not needed normally, so transmitting it wastes time and can add up significantly if many activities are requested - for example when one researcher has many studies that each contain activities. An optional parameter with a name like returned_keys that allows a user to select what information is returned from the DB would help with this.

avaidyam commented 2 years ago

the entire database object is returned, including the full activity specs.

I suppose you mean binary data (i.e. MP3 or PNG files for Breathe/Tips activities) and not full activity specs as a full ActivitySpec should NEVER be returned from an Activity method call.

There's very limited functionality currently for an ignore_binary flag, suggested here in #115, documented here in #150, and referenced in logs here in #431.

Currently, there are no options (or they are undocumented) that control what data is returned

However, you're correct that the ignore_binary functionality is undocumented as it isn't meant to be part of the API right now. It's a stopgap to prevent server-side or browser-side crashes, and we do need to consider a permanent replacement, as you've correctly postulated.

An optional parameter with a name like returned_keys that allows a user to select what information is returned from the DB would help with this.

I'm not a fan of this approach (returned_keys) but it is common in many REST APIs and a first-party capability of GraphQL. The original purpose of the transform parameter was to allow API clients to do exactly this - stripping out unnecessary data or transforming it server-side, like a map-reduce facility.

However, this doesn't resolve a major issue of the API Server actually requesting that data from the database. Requesting 25 lamp.tips Activities, each with a 1MB image in settings, is still 25MB transferred over the wire from the database to the API Server and loaded into memory. Then, the JSONata facility still has to strip that manually from each of the 25 objects, which could take >500ms.

A more performant and "correct" solution would be to automatically determine from the transform parameter, which minimum set of keys are required and only request those from the database. Then, there is no 25MB data transfer or memory allocation, and the JSONata facility itself will be doing minimal work which could likely take <10ms.