Why only the official Llama model from Meta?

namin commented 9 months ago

I am trying to use https://github.com/chaoyi-wu/PMC-LLaMA https://huggingface.co/axiong/PMC_LLaMA_13B instead of the official Llama, and the repo doesn't let me.

Is there a reason why only the official Llama model from Meta is allowed?

Thanks.

karthiksoman commented 9 months ago

I see! Can you please share the exception raised while trying this Llama model?

The use of the official Llama model was motivated by the aim to conduct a comparative analysis with GPT models. I thought the performance comparison would be fair if we use official Llama model instead of using a quantized version of it.

namin commented 9 months ago

$ python -m kg_rag.run_setup

Starting to set up KG-RAG ...

Did you update the config.yaml file with all necessary configurations (such as GPT .env path, vectorDB file paths, other file paths)? Enter Y or N: Y

Checking disease vectorDB ...
vectorDB already exists!

Do you want to install Llama model? Enter Y or N: Y
Did you update the config.yaml file with proper configuration for downloading Llama model? Enter Y or N: Y
Are you using official Llama model from Meta? Enter Y or N: Y
Did you get access to use the model? Enter Y or N: Y
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Model is not downloaded! Make sure the above mentioned conditions are satisfied
Congratulations! Setup is completed.

and

$ python -m kg_rag.rag_based_generation.Llama.text_generation interactive
...
Press enter for Step 5 - LLM prompting
Prompting  llama
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Traceback (most recent call last):
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/scratch/namin/KG_RAG/kg_rag/rag_based_generation/Llama/text_generation.py", line 52, in <module>
    main()
  File "/scratch/namin/KG_RAG/kg_rag/rag_based_generation/Llama/text_generation.py", line 43, in main
    interactive(question, vectorstore, node_context_df, embedding_function_for_context_retrieval, "llama")
  File "/scratch/namin/KG_RAG/kg_rag/utility.py", line 344, in interactive
    llm = llama_model(config_data["LLAMA_MODEL_NAME"], config_data["LLAMA_MODEL_BRANCH"], config_data["LLM_CACHE_DIR"], stream=True) 
  File "/scratch/namin/KG_RAG/kg_rag/utility.py", line 133, in llama_model
    tokenizer = AutoTokenizer.from_pretrained(model_name,
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 736, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
    return cls._from_pretrained(
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 128, in __init__
    self.update_post_processor()
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 141, in update_post_processor
    bos_token_id = self.bos_token_id
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1141, in bos_token_id
    return self.convert_tokens_to_ids(self.bos_token)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 277, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 284, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1160, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 277, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 284, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1160, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 277, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 284, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
...
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1160, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 277, in convert_tokens_to_ids
    return self._convert_token_to_id_with_added_voc(tokens)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 284, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1160, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/home/namin/mambaforge/envs/kg_rag2/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1040, in unk_token
    return str(self._unk_token)
RecursionError: maximum recursion depth exceeded while getting the str of an object

karthiksoman commented 9 months ago

Issue was with the tokenizer that was used to download the model. Apparently, PMC_Llama required 'LlamaTokenizer' and for meta Llama 'AutoTokenizer' was used. In addition, PMC_Llama required to configure an additional 'legacy' flag. I have made those changes and it was successfully downloaded to my server. Please note the following:

run_setup.py is updated now. You can run it and it should download PMC_Llama (just follow the instructions which comes interactively).
Even though it downloads PMC_Llama, for me, it worked only on the prompt_based mode. In the rag based mode, no exceptions are raised, but it was just printing back the question and the context without generating any response. I presume, unlike vanilla Llama, this may require some specific way of prompting (this is a guess). I haven't explored much of PMC_Llama using KG-RAG. Please let me know how it goes for you.
I have now changed the command line arguments to make KG-RAG run flexible. For interactive mode: -i True (if you dont give this, by default it will be non-interactive) For choosing gpt model : -g gpt-4 (if you dont give this, by default it chooses gpt-35-turbo) For running PMC_Lama : -m method-2 (If you dont give this, by default it chooses 'method-1' which is using 'AutoTokenizer') So for example, if you want to run gpt-4 interactive mode:
```
python -m kg_rag.rag_based_generation.GPT.text_generation -i True -g gpt-4
```

All video demos in README are updated based on these changes and also I have cut a new release of KG-RAG to reflect these changes.

I am closing this issue, since this should address the download of PMC_Llama. Feel free to re-open it, if you hit any wall.

BaranziniLab / KG_RAG

Why only the official Llama model from Meta? #10