FLOIP / flow-results

Open specification for the exchange of "Results" data generated by mobile platforms using the "Flow" paradigm
6 stars 2 forks source link

Consider inline embedded media data in the JSON data file #3

Open markboots opened 7 years ago

markboots commented 7 years ago

The first version of the spec stores media data (image, audio, video) simply as a URL to an external resource.

[ "2017-05-23T17:25:12-04:00", 20394823956, 923842093, "ae54da", "https://myexampleflowserver.org/resources/audio/20394823956.ogg", {"type": "audio", "language": "eng", "format": "audio/ogg"} ],

Is it valuable to allow embedding the media for any of these question types within the JSON data file? One possibility is to allow a base-64 encoded string in place of the URL, and a response metadata indicating "inline":1, "format":"audio/ogg", "extension":".ogg"

This allows reducing external dependencies, at the expense of large data file sizes, and the requirement to support two code paths.

pld commented 7 years ago

It might not be frequently used, but when sharing files for analysis an optional base64 embedding of media files would be very valuable for share and transferring data.

One use case I can think of is a recent waterpoint dataset where we had to download the CSV data file, download the images separately, then preprocess to identify and split the media files based on the values of a binary column in the text file. Eventually, I'd want all those steps to exist in an API, it would be a lot easier to send a single file with embedded media files to an API than a text file, compressed image file, and preprocessing instructions. Additionally, for sharing with other machine learning researchers (in combination with a notebook file) a single file would be quite convenient.

nicpottier commented 7 years ago

No denying there is something appealing about having a single file that is guaranteed to be complete. Seems like supporting this as an optional thing would be useful for say one-time batch exports where the result could be zipped in the end (removing most of the downsides of base64 encoding). For actually sending stuff over APIs I would think this is an antipattern just because these will quickly grow into payloads that are impractical to transfer in a single request.

pld commented 7 years ago

Definitely when something running in production you want to push models to the data, but for things I'm doing now exploring different modeling approaches, and sharing notebooks that hook-up to APIs, this would be super useful for me (I'm using requests w/limits on the number of rows so size is bounded)​

ggiraldez commented 6 years ago

I think this is a nice to have, but should definitely be optional, and consumer configurable via a parameter.

Also, since we already have two separate "files" (ie. schema and response data) I'd assume we might want to package both in some way, eg. a .zip file. @pld what about embedding the images in the zip file? Would that still work for your use case?

pld commented 6 years ago

Yea, a zip file is how we're doing it now and it works, just not as convenient

On Thu, Jul 13, 2017 at 10:26 AM, Gustavo Giráldez <notifications@github.com

wrote:

I think this is a nice to have, but should definitely be optional, and consumer configurable via a parameter.

Also, since we already have two separate "files" (ie. schema and response data) I'd assume we might want to package both in some way, eg. a .zip file. @pld https://github.com/pld what about embedding the images in the zip file? Would that still work for your use case?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FLOIP/flow-results/issues/3#issuecomment-315093892, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGEpVx4gGhpKKkZezqWtij-pyLYB19ks5sNikDgaJpZM4OPJqR .