danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://danswer.ai
Other
10.83k stars 1.36k forks source link

[Confluence] - Incorrect space name causes loop of traceback errors on indexing #2642

Closed emerzon closed 1 month ago

emerzon commented 2 months ago

User provided a non-existing space name. However the connector indexing status shows up as "Succeeded" (for 0 documents) instead of failing with the actual error message (No space with key: xxx)

I believe some validation prior to the creation of a connector should be required. (Ie. if indexing a space, ensure it exists first)

WARNING:  10/01/2024 03:11:02 PM                   connector.py  369: [CC Pair ID: 105] [Attempt ID: 109963] Batch failed with space LATAM TC at offset 0 with size 16, processing pages individually...
ERROR:    10/01/2024 03:11:02 PM                   connector.py  448: [CC Pair ID: 105] [Attempt ID: 109963] Ran into exception when fetching pages from Confluence
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 533, in get_all_pages_from_space_raw
    response = self.get(url, params=params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/rest_client.py", line 285, in get
    response = self.request(
               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/rest_client.py", line 257, in request
    self.raise_for_status(response)
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 3091, in raise_for_status
    raise HTTPError(error_msg, response=response)
requests.exceptions.HTTPError: No space with key : LATAM TC

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/danswer/connectors/confluence/connector.py", line 359, in _fetch_space
    return get_all_pages_from_space(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/confluence/rate_limit_handler.py", line 33, in wrapped_call
    return confluence_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 570, in get_all_pages_from_space
    return self.get_all_pages_from_space_raw(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 536, in get_all_pages_from_space_raw
    raise ApiPermissionError(
atlassian.errors.ApiPermissionError: The calling user does not have permission to view the content

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 533, in get_all_pages_from_space_raw
    response = self.get(url, params=params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/rest_client.py", line 285, in get
    response = self.request(
               ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/rest_client.py", line 257, in request
    self.raise_for_status(response)
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 3091, in raise_for_status
    raise HTTPError(error_msg, response=response)
requests.exceptions.HTTPError: No space with key : LATAM TC

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/danswer/connectors/confluence/connector.py", line 441, in _fetch_pages
    _fetch_space(start_ind, self.batch_size)
  File "/app/danswer/connectors/confluence/connector.py", line 380, in _fetch_space
    get_all_pages_from_space(
  File "/app/danswer/connectors/confluence/rate_limit_handler.py", line 33, in wrapped_call
    return confluence_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 570, in get_all_pages_from_space
    return self.get_all_pages_from_space_raw(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 536, in get_all_pages_from_space_raw
    raise ApiPermissionError(
atlassian.errors.ApiPermissionError: The calling user does not have permission to view the content
mboret commented 2 months ago

Same behavior when the confluence bot doesn't have the permissions on the Space. In this case, the green light (Succeeded) is counter-intuitive. Maybe a yellow one to indicate a warning could be an improvement. A completed indexation without any document should not be considered normal.

pablodanswer commented 1 month ago

Great idea– added this so to the roadmap! We plan on having some additional checks for this at creation soon.