Azure / azure-functions-nodejs-library

The Node.js framework for Azure Functions
https://www.npmjs.com/package/@azure/functions
MIT License
53 stars 8 forks source link

How do I reference parameters of output bindings in v4 #84

Open prenaissance opened 1 year ago

prenaissance commented 1 year ago

I would like to make an app with scheduled data source azure functions that queue up the data for later processing. I would like each function to scrape data, and then upload the results to a blob with the name products/{name}/{date}.json and post a queue message with the following structure:

{
  "blobPath": "products/{name}/{date}.json",
  "retries": 0
}

My current output bindings look like so:

const queueOutput = output.storageQueue({
  connection: "AzureWebJobsStorage",
  queueName: "pricex",
});

const blobOutput = output.storageBlob({
  connection: "AzureWebJobsStorage",
  path: "products/{name}/{date}.json",
});

How would I access the runtime path of the blob binding? The documentation is scarce for v4 programming model bindings and the typings are unknown in many places.

Known workarounds

manage the blob connection with @azure/storage-blob instead of bindings.

ejizba commented 1 year ago

So date is easier. You can use the DateTime binding expression and even customize the format using the .NET formats like this:

import { app, HttpRequest, HttpResponseInit, InvocationContext, output } from "@azure/functions";

const blobOutput = output.storageBlob({
  connection: "AzureWebJobsStorage",
  path: "helloworld/{DateTime:MM-dd-yyyy H:mm:ss}.json",
});

export async function httpTrigger1(request: HttpRequest, context: InvocationContext): Promise<HttpResponseInit> {
    context.log(`Http function processed request for url "${request.url}"`);

    const name = request.query.get('name') || await request.text() || 'world';

    context.extraOutputs.set(blobOutput, 'hello');

    return { body: `Hello, ${name}!` };
};

app.http('httpTrigger1', {
    methods: ['GET', 'POST'],
    extraOutputs: [blobOutput],
    authLevel: 'anonymous',
    handler: httpTrigger1
});

I'm not sure there's a way to insert {name} - how is name determined? Also, were you able to get this working in the v3 programming model? If so, I could help you migrate existing code if you have it.

prenaissance commented 1 year ago

I might have not given enough context for my question. This is my attempt at creating the data source function factory.

import { app, output } from "@azure/functions";
import { ProductCrawler } from "./product-crawler";

const queueOutput = output.storageQueue({
  connection: "AzureWebJobsStorage",
  queueName: "pricex",
});

const blobOutput = output.storageBlob({
  connection: "AzureWebJobsStorage",
  path: "products/{name}/{date}.json", // this interpolates correctly
});

export const registerTimerCrawler = (crawler: ProductCrawler) => {
  app.timer(crawler.name, {
    schedule: "0 */2 * * * *",
    extraOutputs: [queueOutput, blobOutput],

    handler: async (_, ctx) => {
      await crawler.crawl();
      const blobBody = await crawler.dataset.getData();
      ctx.extraOutputs.set(blobOutput, blobBody);
      ctx.extraOutputs.set(queueOutput, {
        name: crawler.name,
        blobPath: "products/{name}/{date}.json", // this does not
        retries: 0,
      });
    },
  });
};

And the registered functions fail at runtime at the queue message setting with the error System.Private.CoreLib: Exception while executing function: Functions.<store name>. Microsoft.Azure.WebJobs.Host: No value for named parameter 'name' (the error happens for other interpolation parameters that I tried too, like {date} or {DateTime}) and I haven't found any context properties that would give me the time of function execution. I expected the {name} binding expression to be replaced with the function name, but that can be accessed in this case from the closure.

This is a tweak that I made that works as intended, but does not use the blob binding and is obviously less declarative:

import { app, output } from "@azure/functions";
import { ContainerClient } from "@azure/storage-blob";
import dotenv from "dotenv";

import { ProductCrawler } from "./product-crawler";

dotenv.config();

const containerClient = new ContainerClient(
  process.env.STORAGE_ACCOUNT_CONNECTION_STRING,
  "products",
);

const queueOutput = output.storageQueue({
  connection: "AzureWebJobsStorage",
  queueName: "pricex",
});

export const registerTimerCrawler = (crawler: ProductCrawler) => {
  app.timer(crawler.name, {
    schedule: "0 */2 * * * *",
    extraOutputs: [queueOutput],

    handler: async (_, ctx) => {
      const date = new Date().toISOString();
      const blobPath = `${crawler.name}/${date}.json`;
      const blockBlobClient = containerClient.getBlockBlobClient(blobPath);

      await crawler.crawl();
      const blobBody = JSON.stringify(await crawler.dataset.getData());
      await blockBlobClient.upload(blobBody, blobBody.length, {
        blobHTTPHeaders: {
          blobContentType: "application/json",
        },
      });
      ctx.extraOutputs.set(queueOutput, {
        name: crawler.name,
        blobPath,
        retries: 0,
      });
    },
  });
};
prenaissance commented 1 year ago

As for the queue, I need each dataset to be processed exactly once. If there is a way to ensure that with blob storage only and queue a task on the same blob if the data processing fails, it would be another workaround.