In streamlit_app.py, typewriter effect is simulated by the following code.
response = generate_llama2_response(prompt)
placeholder = st.empty()
full_response = ''
for item in response:
full_response += item
placeholder.markdown(full_response)
placeholder.markdown(full_response)
That's NOT a neat implementation because you can't display the partial response until the full response is received, esp. for the long response. The good example is available on replicate website. https://replicate.com/docs/get-started/python.
Some models stream output as the model is running. They will return an iterator, and you can iterate over that output
iterator = replicate.run(
"mistralai/mixtral-8x7b-instruct-v0.1",
input={"prompt": "Who was Dolly the sheep?"},
)
for text in iterator:
print(text)
In streamlit_app.py, typewriter effect is simulated by the following code. response = generate_llama2_response(prompt) placeholder = st.empty() full_response = '' for item in response: full_response += item placeholder.markdown(full_response) placeholder.markdown(full_response)
That's NOT a neat implementation because you can't display the partial response until the full response is received, esp. for the long response. The good example is available on replicate website. https://replicate.com/docs/get-started/python.
Some models stream output as the model is running. They will return an iterator, and you can iterate over that output
iterator = replicate.run( "mistralai/mixtral-8x7b-instruct-v0.1", input={"prompt": "Who was Dolly the sheep?"}, ) for text in iterator: print(text)
Hopefully llama2 can return an iterator.