Unidata / LDM

The Unidata Local Data Manager (LDM) system includes network client and server programs designed for event-driven data distribution, and is the fundamental component of the Unidata Internet Data Distribution (IDD) system.
http://www.unidata.ucar.edu/software/ldm
Other
43 stars 27 forks source link

Allow pqinsert to read from STDIN #124

Open mdklatt opened 1 year ago

mdklatt commented 1 year ago

Feature Proposal

The pqinsert command should accept products via STDIN in addition to disk files.

Motivation

With distributed and containerized application architectures, sharing a disk across multiple components can be more difficult. I am using the LDM Docker container as a microservice. I added httpexec to the image to allow remote execution of the LDM commands.

However, to execute pqinsert there needs to be a shared volume that client containers can write to and the LDM container can read from. Another approach is a wrapper script in the LDM container that receives a product from httpexec via STDIN, writes it to a local temporary file, and then calls pqinsert for that file. If pqinsert itself could read from STDIN, it could communicate directly with httpexec without the need for a shared Docker volume or intermediate local file.

For some applications, this feature would allow a product to be generated in memory and streamed to LDM without ever being written to disk until being inserted into the queue, which could increase overall performance dramatically.

Implementation

I forked this repo and created a proof of concept of this feature.

API

A filename argument of "-" is interpreted as STDIN instead of a disk path, and is read accordingly. This product will have a key value of "STDIN" unless the -p option is provided (which is recommended in this case). There are no other changes to the current API.

The current implementation allows only one product can be submitted via STDIN. If support for multiple files is essential, a suggestion is to have an option to enable input only from STDIN, and the filename arguments can instead be content lengths that are used to delimit multiple products. Knowing product lengths beforehand would also simplify the implementation, and this information should already be known by the caller.

Internals

STDIN is not guaranteed to act like a disk file, and thus mmap() cannot be used. Instead, the product is read into allocated memory. One limitation of this is that it does not offer the ability to handle large out-of-memory objects like mmap(). An alternative solution would be to stream STDIN to a temporary disk file, but this has a performance penalty.

I have only implemented this for the USE_MMAP code branch, as I could not get the other branch to compile.

semmerson commented 1 year ago

Got it. We'll get back to you.