Indexing with ollama - not working - confused with HF

shuther commented 4 months ago

While I set _DOCUMENT_ENCODERMODEL to mistral (should it be ollama/mistral?), danswer still thinks it should load the model from HF. Is there a way to force him to connect to an external endpoint?

Error in the indexing screen:

mistral is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>

Full trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 286, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/mistral/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
                    ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1368, in hf_hub_download
    raise head_call_error
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
    metadata = get_hf_file_metadata(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
    r = _request_wrapper(
        ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
    response = _request_wrapper(
               ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 409, in _request_wrapper
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 323, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-65dcab1d-7a27637e4c3b704411cf4777;9189a836-21c4-4d09-86bd-56f2cd3b5a7b)

Repository Not Found for url: https://huggingface.co/mistral/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/danswer/background/indexing/run_indexing.py", line 207, in _run_indexing
    new_docs, total_batch_chunks = indexing_pipeline(
                                   ^^^^^^^^^^^^^^^^^^
  File "/app/danswer/utils/timing.py", line 28, in wrapped_func
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/indexing/indexing_pipeline.py", line 148, in index_doc_batch
    chain(*[chunker.chunk(document=document) for document in updatable_docs])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/indexing/indexing_pipeline.py", line 148, in <listcomp>
    chain(*[chunker.chunk(document=document) for document in updatable_docs])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/indexing/chunker.py", line 181, in chunk
    return chunk_document(document)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/indexing/chunker.py", line 76, in chunk_document
    tokenizer = get_default_tokenizer()
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/search/search_nlp_models.py", line 77, in get_default_tokenizer
    _TOKENIZER = (AutoTokenizer.from_pretrained(model_name), model_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/utils/hub.py", line 410, in cached_file
    raise EnvironmentError(
OSError: mistral is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

yuhongsun96 commented 4 months ago

Hello, the encoder model is not the GenerativeAI model. The encoder model takes text and generates a single number as the output and we only support HuggingFace encoders at the moment.

For using Ollama Mistral you'll want to follow this guide: https://docs.danswer.dev/gen_ai_configs/ollama

Will also just mention that your experience will be significantly better with GPT4 or GPT4-Turbo so do try those out if you get a chance!

shuther commented 4 months ago

This is what I did, so not sure which step I missed? I understood also that we refer to the encoder, not GenerativeAI (and experience could be lower), but I expect ollama to be able to run the embedding (see: https://python.langchain.com/docs/integrations/text_embedding/ollama). With curl I was able to generate the vector at least. See my ticket on Ollama. Are you saying that the embedding today is only working through openAI?

yuhongsun96 commented 4 months ago

Hello! I see, we don't currently support Ollama embeddings. We also don't use OpenAI embeddings. We use locally running models using the sentence transformers library: https://huggingface.co/sentence-transformers.

Would love to know more about your use case and if Ollama embedding is critical for your deployment, please DM me/Chris in our Slack: https://join.slack.com/t/danswer/shared_invite/zt-2afut44lv-Rw3kSWu6_OmdAXRpCv80DQ

shuther commented 2 months ago

For scalability reason, running the embeddings within the main platform is a problem; by itself, danswer doesn't need GPU and running embeddings is a low priority vs responsiveness. Maybe using Infinity as an API would solve the problem as we can run it on the same machine or in a different one?

danswer-ai / danswer

Indexing with ollama - not working - confused with HF #1133