I'm trying out Danswer and the Confluence (cloud) connector. It seems to connect fine, but the workspaces we want to index contain some pages that have uploaded movie (AVI/MP4) files. The connector then gets an error message and stops. See errors below. It would be a lot better if it just skipped unknown files like that.
File "/app/danswer/background/indexing/run_indexing.py", line 177, in _run_indexing
for doc_batch in doc_batch_generator:
File "/app/danswer/connectors/confluence/connector.py", line 478, in poll_source
doc_batch, num_pages = self._get_doc_batch(
^^^^^^^^^^^^^^^^^^^^
File "/app/danswer/connectors/confluence/connector.py", line 429, in _get_doc_batch
attachment_text = self._fetch_attachments(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/danswer/connectors/confluence/connector.py", line 375, in _fetch_attachments
raise e
File "/app/danswer/connectors/confluence/connector.py", line 368, in _fetch_attachments
extract = extract_file_text(
^^^^^^^^^^^^^^^^^^
File "/app/danswer/file_processing/extract_file_text.py", line 268, in extract_file_text
raise RuntimeError(f"Unprocessable file type: {file_name}")
RuntimeError: Unprocessable file type: 2018-01-10 TRAM and CJM.mp4```
The same happens for some other file types, like "tar" files.
I'm trying out Danswer and the Confluence (cloud) connector. It seems to connect fine, but the workspaces we want to index contain some pages that have uploaded movie (AVI/MP4) files. The connector then gets an error message and stops. See errors below. It would be a lot better if it just skipped unknown files like that.