BBC-Esq / VectorDB-Plugin-for-LM-Studio

Plugin that lets you ask questions about your documents including audio and video files.
https://www.youtube.com/@AI_For_Lawyers
290 stars 36 forks source link

MacOS setup issues and `workarounds` #88

Closed gramss closed 10 months ago

gramss commented 10 months ago

Here are some issues I ran into while testing my system 3.04 on MacOS alongside with LMStudio 0.2.10 on a M2:

server_connector.py:

  1. missing import sys
  2. Error with the default config.yaml -> test_embeddings = true This part of the test:
    elif sys.platform == 'darwin':
        subprocess.Popen(['open', str(contexts_output_file_path)])
    elif sys.platform.startswith('linux'):
        subprocess.Popen(['xdg-open', str(contexts_output_file_path)])

    leads to:

server_connector.py: Querying database.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

slight note: bark gracefully throws an Exception when there is no chat_history.txt available

bark_module.py: Better Transformer selected.
Exception in thread Thread-14 (run_bark_module):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/opt/homebrew/Cellar/python@3.11/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/user/gits/ChromaDB-Plugin-for-LM-Studio-3.04/src/gui.py", line 282, in run_bark_module
    bark_audio.run()
  File "/Users/user/gits/ChromaDB-Plugin-for-LM-Studio-3.04/src/bark_module.py", line 150, in run
    with open('chat_history.txt', 'r', encoding='utf-8') as file:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'chat_history.txt'

~~The (more or less) recommended embedding_model hkunlp--instructor-xl does not work out of the box. The pytorch_model.bin is located in a sub-directory 2_Dense alongside a dedicated config.json. I needed to copy this model into the top directory of the embedding_model. Maybe it makes sense here to do a deep search for a PyTorch_model.bin or other supported formats?~~

While checking out the huggingface website of this model, I noticed that the files in question are git-LFS files. Probably I wasn't waiting long enough to receive the >4GB model via LFS. -> A status that downloading is still being down would be superb!

also: it would be cool to have persistent settings. But I think this is already mentioned somewhere.

I hope these information are helpful. If my system works nicely, I would be happy to contribute back, but am I little bit unsure how, as there is no tutorial to install the package from the source in git.. Any help/advice regarding that to contribute my now and future findings back?

BBC-Esq commented 10 months ago

Thanks for the detailed message. I indeed added the correct import statement in server_connector.py. Can you clarify the second item you mentioned, to wit, test_embeddings=true related error? I didn't quite understand and have never seen that before; and it works fine on my system...

Regarding the model download, yes, you likely didn't want long enough. Sometimes it's deceptive as to when it's done AND "unpacked" but there should be something printed to the command prompt when it's done.