Closed gururise closed 1 year ago
Hi @gururise!
Thanks for reporting. One hypothesis is that the default client-side timeout may be too low. Can you please add request_timeout=300
when you create the model (here) and try again? This will set the timeout to 5 min instead of 30 sec (default).
If it helps, we'll consider increasing the default timeout. Otherwise, we'll go on with investigating.
Hello, I was digging deep into this problem on my side and I can confirm this is happening to me, too. Thank @gururise for describing that in detail and making the issue reproducible!
Originally I though it is some networking issue on my side (and I discussed this on Discord few days ago), but later I realized it is something inside Hivemind/Petals itself.
I'm actually falling into this "timeout errors" from my own node, when I'm running the inference script on the same machine. I can confirm this happen even on under-utilized machine (no CPU load, plenty of RAM) and without any networking issue at the moment of timeouts.
Hi @gururise!
Thanks for reporting. One hypothesis is that the default client-side timeout may be too low. Can you please add
request_timeout=300
when you create the model (here) and try again? This will set the timeout to 5 min instead of 30 sec (default).If it helps, we'll consider increasing the default timeout. Otherwise, we'll go on with investigating.
Increasing the request_timeout=300
did resolve the issue for the 2nd prompt I gave in my first post.
However, by increasing the prompt to 345 words, I can almost always cause inference to timeout. Using this prompt below, a request_timeout of 300 is not large enough:
Africa is a vast continent, with 54 countries. Although some confuse the entire continent with being a single country. Africa is home to the largest land animal in the world – the African elephant. Africa is the most centrally-located continent on the planet. Both the equator and the Greenwich Meridian line cross it. The largest African country is Algeria. Africa holds the name of being the biggest oil producer in the world. As well as the fastest animal in the world – the cheetah. The world’s largest desert (Sahara) is also situated in Africa. Africa’s largest island is Madagascar. Africa is home to 25% of the world’s bird species. There are over 2500 kinds of birds found throughout its countries. Africa is the world’s hottest continent, with a town in Ethiopia seeing average temperatures of 33.9 °C throughout the year. Four of the five fastest land animals can be found in East Africa. These are the lion, the gazelle, the wildebeest, and the cheetah. The Sahara desert is currently larger than the entire United States. And it continues to grow each year! Inside the country of South Africa is a smaller, landlocked country called Lesotho. The only African countries that weren’t colonized by Europeans were Ethiopia and Liberia. The smallest country in Africa is the Seychelles, which is also an island. It’s also home to the tallest animal in the world, the giraffe. South Africa, officially the Republic of South Africa (RSA) is the southernmost country in Africa. It has an area of 1,219,090 square km. Its capital is Pretoria and largest city is Johannesburg. Zulu, Xhosa, Afrikaans, English, Tsonga, Swazi, and Venda are some of its official languages. Its official currency is South African rand (ZAR). Six countries that share land borders with South Africa are Botswana, Mozambique, Namibia, Swaziland, Lesotho and Zimbabwe. South Africa is a multiethnic society encompassing a wide variety of cultures, languages, and religions.
Question: Where is the world's largest desert? Answer: The Sahara desert is the world's largest desert. Question: Where was the temperatures of 33.9 °C measured? Answer:
Feb 06 13:00:45.865 [INFO] Peer 12D3KooWRftAHGeKyYmq35tn5Daqiu4D9767xUfZJX7E6LH2yKs9 did not respond, banning it temporarily
Feb 06 13:00:45.865 [WARN] [/home/gene/dockerx/bloom/llmvenv/lib/python3.10/site-packages/petals/client/inference_session.py.step:311] Caught exception when running inference from block 16 (retry in 0 sec): ConnectionResetError(104, 'Connection reset by peer')
Feb 06 13:00:45.910 [WARN] [/home/gene/dockerx/bloom/llmvenv/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py.make_sequence:109] Remote SequenceManager is still searching for routes, waiting for it to become ready
Feb 06 13:00:45.910 [INFO] Peer 12D3KooWPfbFqvns4caiPKEPoChchujBTJqLN6KC8CJBSkPjUDL5 did not respond, banning it temporarily
Feb 06 13:00:45.910 [WARN] [/home/gene/dockerx/bloom/llmvenv/lib/python3.10/site-packages/petals/client/inference_session.py.step:311] Caught exception when running inference from block 54 (retry in 0 sec): ConnectionResetError(104, 'Connection reset by peer')
Feb 06 13:00:45.910 [WARN] [/home/gene/dockerx/bloom/llmvenv/lib/python3.10/site-packages/petals/client/routing/sequence_manager.py.make_sequence:109] Remote SequenceManager is still searching for routes, waiting for it to become ready
I don't know if there is a better solution; however, thank you for pointing out the request_timeout
parameter. At least I can probably dynamically compute a value high enough based upon the number of tokens in the input prompt.
@gururise,
Thanks for trying it out. I guess you can just use a very large request_timeout
, e.g., request_timeout=1800
(that's the max supported value). We'll think about using the dynamically computed value by default in future releases.
An alternative is to process the prompt with a smaller timeouts chunk-by-chunk. In the latter case, you may need to fix .generate()
so that it works with the max_new_tokens=0
argument, or just implement the inference loop yourself, as we do in Step 5 of our tutorial notebook.
@gururise,
Thanks for trying it out. I guess you can just use a very large
request_timeout
, e.g.,request_timeout=1800
(that's the max supported value). We'll think about using the dynamically computed value by default in future releases.An alternative is to process the prompt with a smaller timeouts chunk-by-chunk. In the latter case, you may need to fix
.generate()
so that it works with themax_new_tokens=0
argument, or just implement the inference loop yourself, as we do in Step 5 of our tutorial notebook.
Thanks for the tip! Appreciate it!
@gururise @slush0 Follow-up: we did increase the default request_timeout
from 30 sec to 3 min in #276. This should address TimeoutError
s that happened while running inference with a large prefix or fine-tuning with a large batch.
Currently using the chatbot example where session id is saved and inference occurs one token at a time:
When I pass in a small prompt, inference works:
PROMPT
A slightly larger prompt always results in timeout errors:
PROMPT