Open artem-shelkovnikov opened 3 months ago
We are seeing this too and it is preventing our connector from completing a full sync. The issue I see looks very similar to yours, but it does not stem from getting XML back. It looks like there is a payload error caused by a connection reset. Due to the long-running nature of our syncs, I think these happens somewhat frequently:
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')
Then, it looks like the retry wrapper is what's causing that generator didn't stop after athrow
error.
It seems like a ConnectionResetError
would just get handled by the retry wrapper the same as all of the other retries:
https://github.com/elastic/connectors/blob/6952df21c0497fa545fc5bf69c0a63b93264fb2f/connectors/sources/sharepoint_online.py#L315-L336
Unfortunately, I'm not familiar enough with how this connector handles async retries or how Python generators in general work.
@bean710 are you running connectors on-prem?
Do you have a stack trace?
Unfortunately, we don't have a Sharepoint Online instance that raises such errors, so we cannot reliably test fixes. It can help immensely if we test the bug fix together with you.
@artem-shelkovnikov Yes, we are running the connector client on-prem and I have access to full logs with some additional logging I've added.
I'll add the relevant stack trace below.
I'd be happy to work with you in any way possible to get this resolved :)
We've run into a few bugs, but I've "resolved" most of them and this is the one which is most consistently preventing our connector from completing a full sync.
[31;1m[FMWK][08:03:43][CRITICAL] [Connector id: JVur4Y4BM4YOrFqvPEzu, index name: search-sharepoint-debugging, Sync job id: Fn_mMJABM4YOrFqvddr2] Document extractor failed[0m
Traceback (most recent call last):
File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
uncompleted = self._parser.feed_eof()
File "aiohttp/_http_parser.pyx", line 507, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
Not enough data for satisfy transfer length header.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 325, in wrapped
yield item
File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 401, in _get_json
return await resp.json()
File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1171, in json
await self.read()
File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1111, in read
self._body = await self.content.read()
File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 383, in read
block = await self.readany()
File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 405, in readany
await self._wait("readany")
File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 312, in _wait
await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 487, in run
await self.get_docs(generator)
File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 539, in get_docs
async for count, doc in aenumerate(generator):
File "/home/ec2-user/connector-2/connector-main/connectors/utils.py", line 856, in aenumerate
async for elem in asequence:
File "/home/ec2-user/connector-2/connector-main/connectors/logger.py", line 247, in __anext__
return await self.gen.__anext__()
File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 521, in _decorate_with_metrics_span
async for doc in generator:
File "/home/ec2-user/connector-2/connector-main/connectors/sync_job_runner.py", line 458, in prepare_docs
async for doc, lazy_download, operation in self.generator():
File "/home/ec2-user/connector-2/connector-main/connectors/sync_job_runner.py", line 494, in generator
async for doc, lazy_download in self.data_provider.get_docs(
File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 1678, in get_docs
async for list_item, download_func in self.site_list_items(
File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 2027, in site_list_items
async for list_item_attachment in self.client.site_list_item_attachments(
File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 924, in site_list_item_attachments
list_item = await self._rest_api_client.fetch(url)
File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 360, in fetch
return await self._get_json(url)
File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 400, in _get_json
async with self._get(absolute_url) as resp:
File "/usr/local/lib/python3.10/contextlib.py", line 249, in __aexit__
raise RuntimeError("generator didn't stop after athrow()")
RuntimeError: generator didn't stop after athrow()
Bug Description
An error was reported with the following stacktrace:
Upon investigation it became clear, that this problem happens if resource returns invalid json on GET request, example test (assert is incorrect here, but this test will reproduce the error):
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Error is clearly communicated and printed to logs. Sync might continue, if it's non-critical error