Closed paveles closed 1 month ago
@ASCRX Could you please take a look at this issue?
Hello paveles:
Yes. We indeed use a certain version of vllm, which is vllm 0.2.7. Vllm supports most of the current models. Try the following step in colab enviroment: !pip install bert_score !pip install vllm==0.2.7
Please make sure you have downloaded BART checkpoint, and check all required arguments are correctly specified.
@jiminHuang can help with this problem.
Please check our latest notebook https://colab.research.google.com/drive/1ogcCmhMc5lPhUamCk6512H3PJwPEaBZN?usp=sharing. All issues should be addressed.
Dear PIXIU team,
thank you so much for your contribution to the open source community and congratulations for being accepted to the renowned NEURIPS Conference. I am trying to follow proposed steps to run FLARE benchmark of the model. I follow the steps on the Google Colab T4 Instance. Here are the steps:
where run_evaluation.sh is:
The output is:
There are several associated questions:
with run_evaluation.sh:
python ./PIXIU/src/eval.py \ --model hf-causal \ --tasks flare_australian \ --model_args pretrained=PY007/TinyLlama-1.1B-Chat-v0.1,dtype="float32" \ --no_cache
/content/PIXIU/src:/content/PIXIU/src/financial-evaluation:/content/PIXIU/src/metrics/BARTScore 2024-01-21 10:10:46.733456: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-01-21 10:10:46.733512: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-01-21 10:10:46.735055: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-01-21 10:10:48.082461: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [dynet] random seed: 1234 [dynet] allocating memory: 32MB [dynet] memory allocation done. Selected Tasks: ['flare_australian'] Using device 'cuda' config.json: 100% 652/652 [00:00<00:00, 2.81MB/s] model.safetensors: 100% 4.40G/4.40G [00:34<00:00, 126MB/s] generation_config.json: 100% 63.0/63.0 [00:00<00:00, 316kB/s] tokenizer_config.json: 100% 762/762 [00:00<00:00, 3.84MB/s] tokenizer.model: 100% 500k/500k [00:00<00:00, 402MB/s] tokenizer.json: 100% 1.84M/1.84M [00:00<00:00, 3.73MB/s] added_tokens.json: 100% 21.0/21.0 [00:00<00:00, 87.8kB/s] special_tokens_map.json: 100% 438/438 [00:00<00:00, 1.78MB/s] Downloading readme: 100% 641/641 [00:00<00:00, 4.14MB/s] Downloading data: 100% 65.3k/65.3k [00:02<00:00, 31.2kB/s] Downloading data: 100% 25.2k/25.2k [00:01<00:00, 14.0kB/s] Downloading data: 100% 16.8k/16.8k [00:01<00:00, 10.3kB/s] Generating train split: 100% 482/482 [00:00<00:00, 3767.86 examples/s] Generating test split: 100% 139/139 [00:00<00:00, 57843.86 examples/s] Generating valid split: 100% 69/69 [00:00<00:00, 32771.71 examples/s] Task: flare_australian; number of docs: 139 Task: flare_australian; document 0; context prompt (starting on next line): Assess the creditworthiness of a customer using the following table attributes for financial status. Respond with either 'good' or 'bad'. And all the table attribute names including 8 categorical attributes and 6 numerical attributes and values have been changed to meaningless symbols to protect confidentiality of the data. For instance, 'The client has attributes: A1: 0, A2: 21.67, A3: 11.5, A4: 1, A5: 5, A6: 3, A7: 0, A8: 1, A9: 1, A10: 11, A11: 1, A12: 2, A13: 0, A14: 1.', should be classified as 'good'. Text: The client has attributes: A1: 1.0, A2: 18.67, A3: 5.0, A4: 2.0, A5: 11.0, A6: 4.0, A7: 0.375, A8: 1.0, A9: 1.0, A10: 2.0, A11: 0.0, A12: 2.0, A13: 0.0, A14: 39.0.
(end of prompt on previous line) Requests: Req_greedy_until("Assess the creditworthiness of a customer using the following table attributes for financial status. Respond with either 'good' or 'bad'. And all the table attribute names including 8 categorical attributes and 6 numerical attributes and values have been changed to meaningless symbols to protect confidentiality of the data. For instance, 'The client has attributes: A1: 0, A2: 21.67, A3: 11.5, A4: 1, A5: 5, A6: 3, A7: 0, A8: 1, A9: 1, A10: 11, A11: 1, A12: 2, A13: 0, A14: 1.', should be classified as 'good'. \n Text: The client has attributes: A1: 1.0, A2: 18.67, A3: 5.0, A4: 2.0, A5: 11.0, A6: 4.0, A7: 0.375, A8: 1.0, A9: 1.0, A10: 2.0, A11: 0.0, A12: 2.0, A13: 0.0, A14: 39.0. \n", {'until': None})[None]
Running greedy_until requests Maximum 0 turns Running 0th turn 0% 0/139 [00:00<?, ?it/s]Both
main()
File "/content/./PIXIU/src/eval.py", line 62, in main
results = evaluator.simple_evaluate(
File "/content/PIXIU/src/financial-evaluation/lm_eval/utils.py", line 243, in _wrapper
return fn(*args, *kwargs)
File "/content/PIXIU/src/evaluator.py", line 102, in simple_evaluate
results = evaluate(
File "/content/PIXIU/src/financial-evaluation/lm_eval/utils.py", line 243, in _wrapper
return fn(args, **kwargs)
File "/content/PIXIU/src/evaluator.py", line 327, in evaluate
resps = getattr(lm, reqtype)([req.args for req in reqs])
File "/content/PIXIU/src/financial-evaluation/lm_eval/base.py", line 459, in greedy_until
for term in until:
TypeError: 'NoneType' object is not iterable
max_new_tokens
(=32) andmax_length
(=575) seem to have been set.max_new_tokens
will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation) 0% 0/139 [00:02<?, ?it/s] Traceback (most recent call last): File "/content/./PIXIU/src/eval.py", line 97, in