elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.6k stars 8.21k forks source link

Improve file upload in new platform route handlers #56944

Open pgayvallet opened 4 years ago

pgayvallet commented 4 years ago

File upload is something we overlooked regarding route handler features.

A concrete example on how to handle file upload in legacy: https://github.com/elastic/kibana/blob/3e8687c3e3b066e9512f5724dbbd3ebb2112681b/src/legacy/server/saved_objects/routes/import.ts#L50-L91

atm, the file is a readable stream enhanced by some HAPI wrapper to add additional data such as filename

interface HapiReadableStream extends Readable {
  hapi: {
    filename: string;
  };
}

then when accessing the properties:

const { filename } = request.payload.file.hapi;

Questions

elasticmachine commented 4 years ago

Pinging @elastic/kibana-platform (Team:Platform)

mshustov commented 4 years ago

Should we introduce schema.file() and/or a schema.stream()

@kbn/config-schema already provides schema.stream.

Do we want to wrap the HAPI file wrapper to avoid explicit hapi usages when accessing the properties

We need to do something here since Hapi interface shouldn't leak outside. We can handle output: 'stream', accepts: 'multipart/form-data', internally to extend a request object with multipart data, as Hapi does.

My only concern with the current implementation that it stores parsed content in the memory. From Hapi typings for payload.output:

 * * 'stream' - the incoming payload is made available via a Stream.Readable interface. If the payload is 'multipart/form-data' and parse is true, field values are presented as text while files are
 * provided as streams. File streams from a 'multipart/form-data' upload will also have a hapi property containing the filename and headers properties. Note that payload streams for multipart
 * payloads are a synthetic interface created on top of the entire mutlipart content loaded into memory. To avoid loading large multipart payloads into memory, set parse to false and handle the
 * multipart payload in the handler using a streaming parser (e.g. pez).

If the necessity to access the filename the only reason for that, we could change our API not to use FormData (we send the single file anyway) and to pass filename via headers. Side notes: Hapi uses https://github.com/hapijs/subtext to parse payload and https://github.com/hapijs/pez to parse multipart data We can handle output: 'stream', accepts: 'multipart/form-data', internally to extend a request object with multipart data, as Hapi does. My only concern with the current implementation that it stores parsed content in the memory...If the necessity to access the filename the only reason for that, we could change our API to pass filename via headers.

pgayvallet commented 4 years ago

If the necessity to access the filename the only reason for that, we could change our API not to use FormData (we send the single file anyway) and to pass filename via headers.

multipart/form-data is a pain to handle properly, however that's one of the oldest http transfert 'protocol' for files (and has been the only one for a very long time). I really think we should support that one way or another in core (except if we decide that doesn't make sense in a modern SPA, which can be an option).

Also that would mean breaking changes in the SO import/export API, even if this API is 'experimental'.

We can handle output: 'stream', accepts: 'multipart/form-data', internally to extend a request object with multipart data, as Hapi does.

We can do that. It feels like reinventing the wheel a little though, as our underlying http framework already handles it for us. If the file.hapi access/naming is really a concern, we could probably provides our own file type and convert/construct our own file-stream structure from hapi's parsed content? (that doesn't answer the next point though)

My only concern with the current implementation that it stores parsed content in the memory

Not sure to see any other option to use payload content during request's lifecycle without asking the handler to do the parsing itself, which sounds like poor developer experience to me?

mshustov commented 4 years ago

We can do that. It feels like reinventing the wheel a little though, as our underlying http framework already handles it for us. If the file.hapi access/naming is really a concern, we could probably provides our own file type and convert/construct our own file-stream structure from hapi's parsed content? (that doesn't answer the next point though)

Yes, we can keep using hapi under the hood. If we switch from multipart/form-data, it doesn't change the logic a lot, since hapi provides file content as a stream and we parse it manually anyway https://github.com/elastic/kibana/blob/055c61110f8bbbdaea46ba5f28687d6692ccb97e/src/core/server/saved_objects/routes/utils.ts#L34 I think adding a wrapper around hapi parsed file is the simplest option at the moment.

pgayvallet commented 4 years ago

If we switch from multipart/form-data, it doesn't change the logic a lot, since hapi provides file content as a stream and we parse it manually anyway

How do we handle routes with additional payload values without multipart though? We will be sending a json body payload with the 'binary' content of the file in a field?

https://github.com/elastic/kibana/blob/055c61110f8bbbdaea46ba5f28687d6692ccb97e/src/core/server/saved_objects/routes/resolve_import_errors.ts#L44-L69