Open dbro opened 4 months ago
There might be an issue with an incorrect variable name, introduced with commit https://github.com/danswer-ai/danswer/commit/a4d5ac816e37973fd7d6ec143d5ea4cb6c68a1d5
This line works and refers to the variable called file_metadata : https://github.com/danswer-ai/danswer/blob/3f1cd1ad129683090d85a15f6208bfd5a9428100/backend/danswer/connectors/file/connector.py#L72
This line might not work, and it refers to the variable called metadata : https://github.com/danswer-ai/danswer/blob/3f1cd1ad129683090d85a15f6208bfd5a9428100/backend/danswer/connectors/file/connector.py#L101
So perhaps changing line 101 to use file_metadata.get() instead of metadata.get() would fix it?
Note there are other references to metadata.get() that might need to be updated: https://github.com/danswer-ai/danswer/blob/3f1cd1ad129683090d85a15f6208bfd5a9428100/backend/danswer/connectors/file/connector.py#L78 https://github.com/danswer-ai/danswer/blob/3f1cd1ad129683090d85a15f6208bfd5a9428100/backend/danswer/connectors/file/connector.py#L106 https://github.com/danswer-ai/danswer/blob/3f1cd1ad129683090d85a15f6208bfd5a9428100/backend/danswer/connectors/file/connector.py#L107
this line is probably ok (?) https://github.com/danswer-ai/danswer/blob/3f1cd1ad129683090d85a15f6208bfd5a9428100/backend/danswer/connectors/file/connector.py#L135
Just built and seems to work for links now, but I don't see the primary or secondary owner info anywhere. Should it be in the Filters panel?
#DANSWER_METADATA={"link": "https://github.com/danswer-ai/danswer/blob/main/CONTRIBUTING.md", "primary_owners": ["yuhong@danswer.ai", "chris@danswer.ai"], "secondary_owners": ["founders@danswer.ai"], "doc_updated_at": "2024-03-09T13:06:08.589616-08:00", "file_display_name": "Sup Dog!", "type": "banana", "source": "other"}
How to set up captcha
Follow the example below to set up a captcha
like you saw when you visited this page!
By including a captcha, this page is able to
prevent web scrapers from reading it.
It looks like this was broken again by the recent refactoring of the file utility functions in https://github.com/danswer-ai/danswer/pull/1449. That PR introduced the read_text_file
function. And in backend/connectors/file/connector.py the function needs to be called with the ignore_danswer_metadata=False
. Like this:
file_content_raw, file_metadata = read_text_file(file, encoding=encoding, ignore_danswer_metadata=False)
Otherwise it will simply bypass even trying to read the #DANSER_METADATA content from the file.
Hey @eojthebrave, is it possible to reference the file internally? or is it mandatory to be an external link? I mean, I want to open the file that I recently uploaded but I keep receiving a 404, do I need to set a specific path?
@mcandio I'm not sure. I'm very new to this project. What do you mean by reference it internally?
@mcandio I'm not sure. I'm very new to this project. What do you mean by reference it internally?
I mean, for example, if I create the .danswer_metadata.json like this:
[
{
"file_display_name": "filename",
"filename": "filename.pdf",
"link": "./filename.pdf"
}
]
What should be the link to the internal path where the file is stored? does the background deployment or the postgres database creates internal links to host these files?
I am having issue of result not clickable when using file connector, may I know if this issue has been resolved? Or is there any temporary fix?
Using the file connector and the metadata as described here https://docs.danswer.dev/connectors/file , the links are not working.
In the html of the page, the value of the href property is an empty string, and not the value of the link specified. For comparison, the value specified for the file_display_name is working as expected.
Here is the test file I uploaded to Danswer:
These screenshots show what appears in the search results, and in the page html. Notice the blue highlighted line has href="" instead of the expected href="https://docs.danswer.dev/connectors/file"