Azure / azure-functions-nodejs-library

The Node.js framework for Azure Functions
https://www.npmjs.com/package/@azure/functions
MIT License
53 stars 8 forks source link

Support streams (azure resources) #99

Open adamstoffel opened 2 years ago

adamstoffel commented 2 years ago

The Blob Trigger binding for NodeJS functions always binds data as a Buffer even if 'dataType' is set to 'stream' or 'string'.

Repro steps

  1. Create a node-based function using the blob trigger
  2. Set "dataType" to "stream" in function.json:
    
    {
    "bindings": [
    {
      "name": "fileBlob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "container/{fileName}.csv",
      "connection": "AzureWebJobsStorage",
      "dataType": "stream"
    }
    ],
    "scriptFile": "../dist/myFunction/index.js"
    }


#### Expected behavior

`fileBlob` input parameter should be some kind of Stream type

#### Actual behavior

`fileBlob` input parameter is a Buffer

#### Known workarounds

None

#### Related information 

Functions runtime ~3
Node ~14
v-bbalaiagar commented 2 years ago

Hi @adamstoffel, Thank you for your feedback! We will check for the possibilities internally and update you with the findings.

v-bbalaiagar commented 2 years ago

Transferring this issue to NodeJS-worker repo for further insights.

ejizba commented 2 years ago

The Node.js worker doesn't support streaming data. Here is an issue on the host repo with some of the background: https://github.com/Azure/azure-functions-host/issues/1361

We might be able to make progress on this, though. @gohar94 implemented a 'shared memory' feature that is meant to help with memory usage in general, although I don't know if it necessarily passes a stream to the users's code. It's only implemented for the Python worker so far. @gohar94 does the Python implementation pass along a stream to the user? Could or should we do that when implementing this feature for Node.js? Edit: Never mind this isn't related

gohar94 commented 2 years ago

Hi @ejizba - the shared memory feature on Python passes the whole set of bytes to the user function as of today. However, I do think that some stream-like abstraction can be given to the user (like io.BytesIO over the mmap). It will likely still all be loaded in memory but the user could operate on those bytes as a stream, if that's what you're asking. Happy to take a look if you decide to do this for Node.js.

ejizba commented 2 years ago

Yeah I'm less interested in doing a stream if it's just an abstraction, not an actual stream with all the benefits.

How likely is the shared memory feature to help with 'out of memory' issues? Or is it realistically just for performance?

gohar94 commented 2 years ago

Yeah I'm less interested in doing a stream if it's just an abstraction, not an actual stream with all the benefits.

How likely is the shared memory feature to help with 'out of memory' issues? Or is it realistically just for performance?

I believe it does reduce one extra copy (which gRPC creates when transferring between the host and the worker). It does seem possible to do full streaming from the host to the worker using shared memory but it would involve some more work (like communicating what pieces of data become available and which ones have been consumed etc.) Would be an interesting project though.

MarioArriaga92 commented 2 years ago

Is there an ETA on this?

ejizba commented 2 years ago

@MarioArriaga92 no there is not an ETA for this. We want to do it, but it will not be simple and so far other work has been higher priority. Specifically, you can see our roadmap here with the work we currently have prioritized: https://github.com/Azure/azure-functions-nodejs-worker/wiki/Roadmap

sig9 commented 1 year ago

Having to consider other hosting options due to this limitation. My use case is to implement a server api in node that calls an OpenAI api function which returns a Server-Sent Events stream. I would like to relay those responses to the browser. I have a minimal git repo that demonstrates the issue. No deployment needed to demonstrate this, the azure-functions-core-tools package behaves the same way as a deployed app. https://github.com/sig9/nuxt-app-sse/

ejizba commented 1 year ago

I'm cleaning up the stream issues and will use this to track support for streaming Azure resources. This was finally unblocked for us in the host: https://github.com/Azure/azure-functions-dotnet-worker/issues/1081. The .NET Isolated worker was the first one to use it, but we should be able to benefit in Node.js as well.

Unfortunately I don't have any ETAs. For now we're focused on GA-ing the v4 programming model, but we will keep the roadmap updated with any stream plans: https://github.com/Azure/azure-functions-nodejs-library/wiki/Roadmap

Streaming http will be done separately, tracked by https://github.com/Azure/azure-functions-nodejs-library/issues/97