HowieHwong / TrustLLM

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
https://trustllmbenchmark.github.io/TrustLLM-Website/
MIT License
435 stars 40 forks source link

Empty generation results #11

Closed pearlmary closed 7 months ago

pearlmary commented 8 months ago

Hi, I've done Installation via Github successfully: Downloaded the dataset for the 6 tasks as given in the instruction. Generation also works fine, but nothing gets saved. Simply folders are getting created. attached screenshot for ref.

image

Evaluation shows key error 'res'.

image

Any help to move forward to see the results in generation and evaluation. Tried it but not able to solve it.

Thanks in Advance!

HowieHwong commented 8 months ago

Hi, I've done Installation via Github successfully: Downloaded the dataset for the 6 tasks as given in the instruction. Generation also works fine, but nothing gets saved. Simply folders are getting created. attached screenshot for ref.

image

Evaluation shows key error 'res'.

image

Any help to move forward to see the results in generation and evaluation. Tried it but not able to solve it.

Thanks in Advance!

Hi, thanks for your feedback. Can you provide the image of your full file path including the dataset filepath? We're working on fixing this error.

HowieHwong commented 8 months ago

Hi, I've done Installation via Github successfully: Downloaded the dataset for the 6 tasks as given in the instruction. Generation also works fine, but nothing gets saved. Simply folders are getting created. attached screenshot for ref.

image

Evaluation shows key error 'res'.

image

Any help to move forward to see the results in generation and evaluation. Tried it but not able to solve it.

Thanks in Advance!

Hi, we have fixed the bug. Try again and see whether there are any errors in generation.

pearlmary commented 7 months ago

Hi, Thank you for fixing the issue of empty generation results. Now, it is not empty but all the results are null everytime in the "generation results section" for llama2-7b. It shows error and also completed. So unable to perform evaluation also. Screenshot is also attached.

image

Can you help me out?

Thanks in Advance

HowieHwong commented 7 months ago

Hi, Thank you for fixing the issue of empty generation results. Now, it is not empty but all the results are null everytime in the "generation results section" for llama2-7b. It shows error and also completed. So unable to perform evaluation also. Screenshot is also attached. image

Can you help me out?

Thanks in Advance

Hi, we have fixed the bug. Try again and see if it works well now.

pearlmary commented 7 months ago

Hi, Thanks for fixing it.

But still llama2-7b generates only null response in "res: null" for all in the generation results

image

In evaluation for privacy, these are the results.

image

Hope you will be able to figure out the issue? If you need any specific information, please let me know. Thanks in advance

HowieHwong commented 7 months ago

Hi, Thanks for fixing it.

But still llama2-7b generates only null response in "res: null" for all in the generation results image

In evaluation for privacy, these are the results. image

Hope you will be able to figure out the issue? If you need any specific information, please let me know. Thanks in advance

Hi, we have tried to fix the bug. Try it again. By the way, can you provide the detailed code for your error?

pearlmary commented 7 months ago

Once again, Thank you for the continued support.

I need to use llama2 model "meta-llama/Llama-2-7b-chat-hf" for Trustllm result generation and evaluation.

So far things I've done:-

  1. Installation via Github:

git clone git@github.com:HowieHwong/TrustLLM.git Created a new python 3.9 environment using venv Also Installed required packages using the following cmd: cd trustllm_pkg pip install .

  1. Also I've successfully downloaded the dataset in Data folder under trustllm_pkg folder using cmds given: from trustllm.dataset_download import download_dataset download_dataset(save_path='Data')

  2. Generation always results in "null" as the result for all the prompts in all the 6 subsections. (cmds used) from trustllm.generation.generation import LLMGeneration llm_gen = LLMGeneration( model_path="meta-llama/Llama-2-7b-chat-hf", test_type="privacy",
    data_path="Data", device="cuda" ) llm_gen.generation_results()

Now, after you fixed the bug, it is asking for openai key. But it is not neccessary for llama2.

image

Also it completes the generation with null results as shown below.

Traceback (most recent call last): File "/content/TrustLLM/trustllm_pkg/trustllm/generation/generation.py", line 100, in generation ans = gen_online(model_name, prompt, temperature, replicate=self.use_replicate, deepinfra=self.use_deepinfra) File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 289, in wrapped_f return self(f, *args, **kw) File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 379, in call do = self.iter(retry_state=retry_state) File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 326, in iter raise retry_exc from fut.exception() tenacity.RetryError: RetryError[<Future at 0x7ee9a9328c40 state=finished raised APIRemovedInV1>]

test_res not in file_config Test function successful on attempt 1

OK

  1. Because of the null results evaluation shows too much error.

Kindly check it and please let me know where am I going wrong.

Thanks for the help

HowieHwong commented 7 months ago

Thanks for your feedback. We have tried to fix the bug and now please try it again.

pearlmary commented 7 months ago

Tried it, think it did not go well this time too..

image

Can you check this issue?

HowieHwong commented 7 months ago

I have made some modifications. See whether it works now.

pearlmary commented 7 months ago

Thank you so much @HowieHwong for your quick help in sorting out the issues. Now it worked after adding this to line no. 274 in generation.py file "model, tokenizer = load_model(. "