data61 / anonlink-entity-service

Privacy Preserving Record Linkage Service
Apache License 2.0
26 stars 8 forks source link

streamed processing of CLKs in the front-end #184

Open wilko77 opened 6 years ago

wilko77 commented 6 years ago

Previously, we were accessing the CLKs in a streaming fashion to avoid parsing the json in one hit. This enables running the web front-end with less memory.

However, as connexion is very, very strict about input validation when it comes to json, it will always consume the stream first to validate it against the spec. Thus the backflip to fully reading the CLks as json into memory.

Possible approaches are:

Aha! Link: https://csiro.aha.io/features/ANONLINK-16

hardbyte commented 6 years ago

The api now supports uploading via a binary stream in #208 but there is an issue with connexion still interfering. Reported upstream: https://github.com/zalando/connexion/issues/592

hardbyte commented 6 years ago

Perhaps we can make a separate app (without connexion) just to deal with binary data uploading?

hardbyte commented 6 years ago

What about this idea of using nginx to buffer the upload to disk and then passing the filename to our flask backend?