danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://danswer.ai
Other
10.54k stars 1.31k forks source link

Confluence Connector fails when indexing single pages #2149

Closed emerzon closed 1 week ago

emerzon commented 2 months ago

While indexing single pages, the following exception happens:

08/16/2024 03:39:03 PM         connector.py 269 : [Attempt ID: 11656] Batch failed with page viewpage.action at offset 0 with size 1, processing pages individually...
08/16/2024 03:39:03 PM         connector.py 286 : [Attempt ID: 11656] Page viewpage.action at offset 0 failed: 'str' object has no attribute 'get'
08/16/2024 03:39:03 PM         connector.py 434 : [Attempt ID: 11656] Ran into exception when fetching pages from Confluence
Traceback (most recent call last):
  File "/app/danswer/connectors/confluence/connector.py", line 257, in _fetch_single_depth_child_pages
    child_page = get_page_child_by_type(
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/confluence/rate_limit_handler.py", line 32, in wrapped_call
    return confluence_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 165, in get_page_child_by_type
    return response.get("results")
           ^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/danswer/connectors/confluence/connector.py", line 429, in _fetch_pages
    else _fetch_page(start_ind, self.batch_size)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/confluence/connector.py", line 398, in _fetch_page
    self.recursive_indexer = RecursiveIndexer(
                             ^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/confluence/connector.py", line 179, in __init__
    self.pages = self.recurse_children_pages(0, self.origin_page_id)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/confluence/connector.py", line 217, in recurse_children_pages
    while batch := self._fetch_single_depth_child_pages(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/confluence/connector.py", line 287, in _fetch_single_depth_child_pages
    raise e
  File "/app/danswer/connectors/confluence/connector.py", line 277, in _fetch_single_depth_child_pages
    child_page = get_page_child_by_type(
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/connectors/confluence/rate_limit_handler.py", line 32, in wrapped_call
    return confluence_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/atlassian/confluence.py", line 165, in get_page_child_by_type
    return response.get("results")
           ^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'

Apparently there are 2 issues here: 1) In the Atlassian Confluence SDK, which always expects response to be a dict - But in the case of some error it can be a str (but this is outside Danswer scope) 2) The actual error that causes response to be a str seems to be invalid authentication - The content of response is a redirection to the authentication page. So it seems that especifically for this function, the Confluence client auth token is not being passed properly

Weves commented 2 months ago

@emerzon for this issue, are you recursively indexing (e.g. specifying a root page)? What's the URL that you're putting in via the UI?

emerzon commented 1 week ago

Closing as no longer reproducible