elastic / connectors

Source code for all Elastic connectors, developed by the Search team at Elastic, and home of our Python connector development framework
https://www.elastic.co/guide/en/enterprise-search/master/index.html
Other
58 stars 116 forks source link

[Sharepoint Online] Invalid JSON will cause unactionable "generator didn't stop after athrow" errors #2309

Open artem-shelkovnikov opened 3 months ago

artem-shelkovnikov commented 3 months ago

Bug Description

An error was reported with the following stacktrace:

generator didn't stop after athrow()
File \"/app/connectors/es/sink.py\", line 365, in run
    await self.get_docs(generator)
  File \"/app/connectors/es/sink.py\", line 415, in get_docs
    async for count, doc in aenumerate(generator):
  File \"/app/connectors/utils.py\", line 800, in aenumerate
    async for elem in asequence:
  File \"/app/connectors/logger.py\", line 176, in __anext__
    return await self.gen.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/connectors/es/sink.py\", line 389, in _decorate_with_metrics_span    async for doc in generator:
  File \"/app/connectors/sync_job_runner.py\", line 309, in prepare_docs
    async for doc, lazy_download, operation in self.generator():
  File \"/app/connectors/sync_job_runner.py\", line 341, in generator
    async for doc, lazy_download in self.data_provider.get_docs(
  File \"/app/connectors/sources/sharepoint_online.py\", line 1660, in get_docs
    async for list_item, download_func in self.site_list_items(
  File \"/app/connectors/sources/sharepoint_online.py\", line 2009, in site_list_items
    async for list_item_attachment in self.client.site_list_item_attachments(
  File \"/app/connectors/sources/sharepoint_online.py\", line 909, in site_list_item_attachments
    list_item = await self._rest_api_client.fetch(url)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/connectors/sources/sharepoint_online.py\", line 359, in fetch
    return await self._get_json(url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/connectors/sources/sharepoint_online.py\", line 399, in _get_json
    async with self._get(absolute_url) as resp:
  File \"/usr/local/lib/python3.11/contextlib.py\", line 257, in __aexit__
    raise RuntimeError(\"generator didn't stop after athrow()\")

Upon investigation it became clear, that this problem happens if resource returns invalid json on GET request, example test (assert is incorrect here, but this test will reproduce the error):

    @pytest.mark.asyncio
    async def test_fetch_with_retrying_multiple_times(
        self, microsoft_api_session, mock_responses, patch_sleep
    ):
        url = "http://localhost:1234/url"
        payload = '[{"test": "hello world"}, { "test": "hello another world"}'

        first_request_error = Exception("lala")

        # First error out, then on request to same resource return good payload
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)

        response = await microsoft_api_session.fetch(url)

        assert response == payload

To Reproduce

Steps to reproduce the behavior:

  1. Find a way to make Sharepoint Server endpoint return XML instead of JSON (not sure how yet, can be reproduced with test)
  2. Run a sync
  3. Sync will end with RuntimeError with message "generator didn't stop after athrow"

Expected behavior

Error is clearly communicated and printed to logs. Sync might continue, if it's non-critical error

bean710 commented 1 week ago

We are seeing this too and it is preventing our connector from completing a full sync. The issue I see looks very similar to yours, but it does not stem from getting XML back. It looks like there is a payload error caused by a connection reset. Due to the long-running nature of our syncs, I think these happens somewhat frequently: aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')

Then, it looks like the retry wrapper is what's causing that generator didn't stop after athrow error.

It seems like a ConnectionResetError would just get handled by the retry wrapper the same as all of the other retries: https://github.com/elastic/connectors/blob/6952df21c0497fa545fc5bf69c0a63b93264fb2f/connectors/sources/sharepoint_online.py#L315-L336

Unfortunately, I'm not familiar enough with how this connector handles async retries or how Python generators in general work.

artem-shelkovnikov commented 1 week ago

@bean710 are you running connectors on-prem?

Do you have a stack trace?

Unfortunately, we don't have a Sharepoint Online instance that raises such errors, so we cannot reliably test fixes. It can help immensely if we test the bug fix together with you.

bean710 commented 1 week ago

@artem-shelkovnikov Yes, we are running the connector client on-prem and I have access to full logs with some additional logging I've added.

I'll add the relevant stack trace below.

I'd be happy to work with you in any way possible to get this resolved :)

We've run into a few bugs, but I've "resolved" most of them and this is the one which is most consistently preventing our connector from completing a full sync.

[FMWK][08:03:43][CRITICAL] [Connector id: JVur4Y4BM4YOrFqvPEzu, index name: search-sharepoint-debugging, Sync job id: Fn_mMJABM4YOrFqvddr2] Document extractor failed
Traceback (most recent call last):
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
    uncompleted = self._parser.feed_eof()
  File "aiohttp/_http_parser.pyx", line 507, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 325, in wrapped
    yield item
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 401, in _get_json
    return await resp.json()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1171, in json
    await self.read()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1111, in read
    self._body = await self.content.read()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 383, in read
    block = await self.readany()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 405, in readany
    await self._wait("readany")
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 487, in run
    await self.get_docs(generator)
  File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 539, in get_docs
    async for count, doc in aenumerate(generator):
  File "/home/ec2-user/connector-2/connector-main/connectors/utils.py", line 856, in aenumerate
    async for elem in asequence:
  File "/home/ec2-user/connector-2/connector-main/connectors/logger.py", line 247, in __anext__
    return await self.gen.__anext__()
  File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 521, in _decorate_with_metrics_span
    async for doc in generator:
  File "/home/ec2-user/connector-2/connector-main/connectors/sync_job_runner.py", line 458, in prepare_docs
    async for doc, lazy_download, operation in self.generator():
  File "/home/ec2-user/connector-2/connector-main/connectors/sync_job_runner.py", line 494, in generator
    async for doc, lazy_download in self.data_provider.get_docs(
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 1678, in get_docs
    async for list_item, download_func in self.site_list_items(
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 2027, in site_list_items
    async for list_item_attachment in self.client.site_list_item_attachments(
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 924, in site_list_item_attachments
    list_item = await self._rest_api_client.fetch(url)
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 360, in fetch
    return await self._get_json(url)
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 400, in _get_json
    async with self._get(absolute_url) as resp:
  File "/usr/local/lib/python3.10/contextlib.py", line 249, in __aexit__
    raise RuntimeError("generator didn't stop after athrow()")
RuntimeError: generator didn't stop after athrow()