Azure / Azure-Sentinel

Cloud-native SIEM for intelligent security analytics for your entire enterprise.
https://azure.microsoft.com/en-us/services/azure-sentinel/
MIT License
4.53k stars 2.97k forks source link

Cisco Umbrella Function App Fails after several weeks or months #9701

Closed rvluna closed 9 months ago

rvluna commented 9 months ago

Describe the bug We've deployed a Cisco Umbrella data connector(Data Connector) supplied by the Content Hub page of Sentinel. https://github.com/Azure/Azure-Sentinel/tree/master/Solutions/CiscoUmbrella/Data%20Connectors

After creating the Function app, it will run successfully for a number of months (most recent lasted only for 2 weeks).

We are using the same AWS keys and bucket and the same workspace ID and key.

It will just fail and based on the info we gathered so far, the error is giving us these message: Full Exception :


Exception while executing function /Functions.ciscoUmbrellaDataConn ---> Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException /Result /Failure Exception /Error /line contains NUL

/
'/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py', line 493, in _handleinvocation_request call_result = await self._loop.run_in_executor(
'/usr/local/lib/python3.8/concurrent/futures/thread.py', line 57, in run result = self.fn(*self.args, self.kwargs)
'/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py', line 762, in _run_sync_func return ExtensionManager.get_sync_invocation_wrapper(context,
'/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/extension.py', line 215, in _raw_invocation_wrapper result = function(
args)
'/home/site/wwwroot/ciscoUmbrellaDataConn/
init.py', line 97, in main cli.process_file(obj, dest=sentinel)
'/home/site/wwwroot/ciscoUmbrellaDataConn/init.py', line 469, in process_file for event in parser_func(csv_file) /
'/home/site/wwwroot/ciscoUmbrellaDataConn/
init__.py', line 299, in parse_csv_proxy for row in csv_reader


To Reproduce This is not easy to reproduce since it requires a functioning Cisco Umbrella and AWS storage. Also the time factor is also a question since we could not determine how often this error occurs.

Expected behavior Umbrella logs are downloaded continuously without any problem.

Screenshots See below timeline below: image

Oddly enough as well, between the two logs being downloaded by the app, only one Cisco DNS logs are failing and the proxy is still being downloaded.

Other errors received when raised via MS Support: image

Additional context Not sure if we can raise this via MS Support but when raising and going to the troubleshooting part, see below: image

Let me know if we need to raise this to MS or continue here.

Not sure if this is related to phyton upgrade in function apps (out of our control) as well or if it is something else.

github-actions[bot] commented 9 months ago

Thank you for submitting an Issue to the Azure Sentinel GitHub repo! You should expect an initial response to your Issue from the team within 5 business days. Note that this response may be delayed during holiday periods. For urgent, production-affecting issues please raise a support ticket via the Azure Portal.

v-muuppugund commented 9 months ago

Hi @rvluna , Thanks for flagging this issue, we will investigate this issue and get back to you with some updates by 8Jan2024. Thanks!

v-muuppugund commented 9 months ago

Hi @rvluna ,Could you please share more details i.e. Traces and exception details from application details and also need some insights of the volume of data from AWS is being processed by function app,if needed we can have a call to discuss more in detail,so we may need to redesign based on insights

rvluna commented 9 months ago

@v-muuppugund: Do we need to raise a case with MS on this as my concern would be sharing data here that may violate certain regulations?

In regards to the exceptions, we got most of our data when running diagnostic when raising a case. We are in a tight spot here, apparently logs cannot be queried using the Support case method. Not sure if the logs would still be available as our app failed between the 15th to the 19th of December.

Let me know how can we proceed.

v-muuppugund commented 9 months ago

Hi @rvluna ,you can share the details to my email i.e. (v-muuppugund@microsoft.com), we can get exceptions and traces from app insights for the failed dates, Please let me know if you have any issues, we can have a call to discuss further.,We can raise a support case ,will pick it up and work on it.

rvluna commented 9 months ago

Hi @v-muuppugund, I'm raising a case with our provider as I could not raise a direct case with MS. I will ask them to cc you in creating the case on MS side, from there I will share the information you have requested. But here are the screenshots from diagnostics when raising to MS: (These are from the 19th) image

image

image

image

Prior to that date, diagnostics is stating that the function app are not triggering dated on the 12th to 13th of December: image

I will let you know once the case is raised.

v-muuppugund commented 9 months ago

Sure @rvluna ,Will check the details in case. As you are raising the case, as there is no pending action on this issue, we are closing your issue (https://github.com/Azure/Azure-Sentinel/issues/9701) as per our standard operating procedures. If you still need support for this issue, feel free to re-open at any time. Thank you for your co-operation!