PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
Apache License 2.0
20.03k stars 2.23k forks source link

Truncation not explicitly mention #813

Open udbhav-44 opened 4 months ago

udbhav-44 commented 4 months ago

I get this error when i Try to run a query

Truncation was not explicitly activated but max_length is provided a specific value, please use truncation=True to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to truncation. Setting pad_token_id to eos_token_id:128001 for open-end generation. C:\Users\Tarun Sridhar.conda\envs\mummy\lib\site-packages\transformers\models\llama\modeling_llama.py:648: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = torch.nn.functional.scaled_dot_product_attention(

What can be possible fixes?

GregChiang0201 commented 3 months ago

I also try to run a query face the same problem, but the system only shows "Setting pad_token_id to eos_token_id:128001 for open-end generation.", have you ever solve the problem yet, pls help.

KansaiTraining commented 3 months ago

I got the same message and the query takes forever... Any explanation of the error and if it has influence on the query results?

GregChiang0201 commented 3 months ago

I find the problem is, this author build the program in serial, instead of parallel, while you compile run_localGPT, you can also monitor you CPU usage(by top, or htop instructions). In my aspect, I only utilize 1~2 cpu cores to run the program, that’s the reason why it run so slow.

On Jul 25, 2024, at 4:20 PM, KansaiTraining @.***> wrote:

I got the same message and the query takes forever... Any explanation of the error and if it has influence on the query results?

— Reply to this email directly, view it on GitHub https://github.com/PromtEngineer/localGPT/issues/813#issuecomment-2249743173, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJC5EIKMTLAJJ6K2KGHHCG3ZOCYMJAVCNFSM6AAAAABKDZOUTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBZG42DGMJXGM. You are receiving this because you commented.

maxrmp commented 2 months ago

Same issue here... I also see my SSD reading a lot because of python 3.10, even after getting :

Truncation was not explicitly activated but max_length is provided a specific value, please use truncation=True to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to truncation. Setting pad_token_id to eos_token_id:128001 for open-end generation.

Has anyone found a solution?