Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.92k stars 441 forks source link

Malformed ISO timestamps in cosmosdb input binding #9610

Open sebastianneubauer opened 11 months ago

sebastianneubauer commented 11 months ago

Investigative information

Repro steps

In a python azure function I have this comos db input binding

@app.function_name(name="UpdateChart")
@app.schedule(schedule="0 */5 * * * *", 
              arg_name="mytimer",
              run_on_startup=True)
@app.cosmos_db_input(arg_name="cosmos", 
                     database_name="particle-photon-db2",
                     container_name="weather",
                     sqlQuery="SELECT TOP 4500 * FROM c WHERE c.published_at > '2020-10-06T07:45:16.081Z' AND c.event='weather-data-v1' ORDER BY c.published_at DESC",
                     connection="MyCosmosDBConnectionString")
@app.blob_output(arg_name="outputblob",
                path="weather-app-container/chart_new.html",
                connection="MyStorageConnectionString")
def update_chart(mytimer: func.TimerRequest, cosmos: func.DocumentList, outputblob: func.Out[str]):
    dflist = []

    for i, item in enumerate(cosmos):
        # Append each item as a dictionary to list
        dict_item = dict(item['data'])
        logging.info(item['published_at'])
        dict_item['published_at'] = item['published_at']
        dflist.append(dict_item)

    df = pd.DataFrame.from_records(dflist)

    df['published_at'] = pd.to_datetime(df['published_at'])
...

Expected behavior

This worked in V1 of the azure functions just fine

Actual behavior

In V2 of azure functions with this hosts.json

{
  "version": "2.0",
  "extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[4.0.0, 5.0.0)"
  },
  "extensions": {
      "cosmosDB": {
          "connectionMode": "Gateway"
      }
  }
}

it breaks for some data with the following exception:

Result: Failure 
Exception: ValueError: time data "2023-10-10T20:40:16Z" doesn't match format "%Y-%m-%dT%H:%M:%S.%f%z", at position 1429. You might want to try: - passing `format` if your strings have a consistent format; - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format; - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this. 
Stack:
 File "/azure-functions-host/workers/python/3.10/LINUX/X64/azure_functions_worker/dispatcher.py", line 479, in _handle__invocation_request call_result = await self._loop.run_in_executor( 
 File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) 
 File "/azure-functions-host/workers/python/3.10/LINUX/X64/azure_functions_worker/dispatcher.py", line 752, in _run_sync_func return ExtensionManager.get_sync_invocation_wrapper(context, 
 File "/azure-functions-host/workers/python/3.10/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper result = function(**args) 
 File "/home/site/wwwroot/function_app.py", line 35, in update_chart df['published_at'] = pd.to_datetime(df['published_at']) 
 File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/tools/datetimes.py", line 1112, in to_datetime values = convert_listlike(arg._values, format) 
 File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/tools/datetimes.py", line 488, in _convert_listlike_datetimes return _array_strptime_with_fallback(arg, name, utc, format, exact, errors) 
 File "/home/site/wwwroot/.python_packages/lib/site-packages/pandas/core/tools/datetimes.py", line 519, in _array_strptime_with_fallback result, timezones = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc) 
 File "strptime.pyx", line 534, in pandas._libs.tslibs.strptime.array_strptime 
 File "strptime.pyx", line 355, in pandas._libs.tslibs.strptime.array_strptime

And indeed, after some debugging, I confirmed, that the timestamps in 'published_at' are malformed. Example: "2023-10-10T20:40:16Z" is wrong, when looking at the cosmos db data, there the timestamp string is: "2023-10-10T20:40:16.000Z". Note the missing trailing ".000" before the Z. I also confirmed, that all trailing 0 before the Z are stripped.

I could not find a way my code could introduce this bug. It most likely is introduced in the binding.

Summary: "2023-10-10T20:40:16Z" in 'published_at' handed into my azure function is wrong and should be "2023-10-10T20:40:16.000Z"

Known workarounds

I could to some evil string magic (fill up missing 0 before the Z), but I would rather not do it.

bhagyshricompany commented 11 months ago

Thanks for informing will check update this.plz share the func app name also

sebastianneubauer commented 11 months ago

app name is "UpdateChart", but I don't think that is relevant, you should be reproducible rather easy, just check any ISO timestamp in the results of an cosmosdb input binding. BTW: I just saw, the same is happening in the logs in the portal as well, so putting this into any function:

logging.info("2023-10-10T20:40:16.000Z")

results in this output in the azure function monitoring log:

"2023-10-15T20:10:01Z"