VEuPathDB / lib-jaxrs-container-core

Core library for VEuPathDB JaxRS container services
Apache License 2.0
0 stars 0 forks source link

New filter/middleware to parse multipart/form-data to pojo. #17

Closed Foxcapades closed 1 year ago

Foxcapades commented 2 years ago

Problem

When generating multipart/form-data endpoints from RAML, the generated code expects a POJO rather than annotated jersey inputs.

Presently, this means disabling code gen and manually editing the generated code to make the endpoint usable.

What we want is to continue to use code gen even when accepting multipart/form-data.

Proposal

To fix this, we can add a new Jersey filter that kicks in for multipart/form-data requests and does the following:

1. Read the full request body and:
    a. parse all non-file fields and inject them into the Jackson annotated fields in the POJO
    b. read the file inputs to temp files and set the Jackson annotated `File` fields in the POJO
2. Pass the populated pojo on to the controller

Additionally, to avoid bloating the container image with tmp files, the files should be cleaned up on request completion, meaning we will also need an outgoing filter to perform that cleanup.

Example Pojo

class MyUpload {
  @JsonProperty("some_text_field")
  var textField: String

  @JsonProperty("some_bin_file")
  var binFile: File

  @JsonProperty("some_text_file")
  var textFile: File
}

Example Body

-----Some Divider
Content-Disposition: form-data; name="some_text_field"

My text field text.
-----Some Divider
Content-Disposition: form-data; name="some_bin_file"; filename="my-file.bin"
Content-Type: application/octet-stream

asdfasdfl;kjasd;lkfa;slhdf;lakwje;laknsdv;lkansd;ahsdr
-----Some Divider
Content-Disposition: form-data; name="some_text_file"; filename="my-file.txt"
Content-Type: text/plain

Some text in a text file being uploaded.
-----Some Divider--

Populated Pojo

MyUpload: {
  textField = "My text field text."
  binFile   = File("tmp/{some-hash}/my-file.bin")
  textFile  = File("tmp/{some-hash}/my-file.txt")
}

Additional Notes

Linked Issues

ryanrdoherty commented 2 years ago

@Foxcapades Wondering if the plan is to have this be a PreMatch filter and convert the actual Content-Type header to application/json. Would have to either convert or supplement the content type in the RAML. Thinking maybe just convert in the generated interface and in the filter? Thinking about how we can support a use case where we want an endpoint to support both json and multipart content-types.

Foxcapades commented 2 years ago

Not sure if it was generated this way, or if I did this manually so it requires investigation, but in multi-blast there are 2 controiller methods, one that takes json input and one that takes multipart/form-data.

In the second case, the method accepts json input as one of the form fields

Foxcapades commented 2 years ago

This issue is not relevant if we can switch to OpenAPI and the code generator for it.

An alternative task may be to port the example project and the existing projects using multipart over to OpenAPI

Foxcapades commented 2 years ago

We scrapped/shelved the OpenAPI idea for the time being due to a known issue with the OpenAPI code generation for models based on Json Schema that makes use of one or more of the oneOf, anyOf, or allOf features.

Instead this functionality is being implemented in a new library: https://github.com/VEuPathDB/lib-jersey-multipart-jackson-pojo

Once that library is reviewed and approved, it will be added to this library's plugged in to the default resources for all services based on this core.