Don't have to simulate typewriter effect in streamlit_app.py

In streamlit_app.py, typewriter effect is simulated by the following code. response = generate_llama2_response(prompt) placeholder = st.empty() full_response = '' for item in response: full_response += item placeholder.markdown(full_response) placeholder.markdown(full_response)

That's NOT a neat implementation because you can't display the partial response until the full response is received, esp. for the long response. The good example is available on replicate website. https://replicate.com/docs/get-started/python.

Some models stream output as the model is running. They will return an iterator, and you can iterate over that output

iterator = replicate.run( "mistralai/mixtral-8x7b-instruct-v0.1", input={"prompt": "Who was Dolly the sheep?"}, ) for text in iterator: print(text)

Hopefully llama2 can return an iterator.

dataprofessor / llama2

Don't have to simulate typewriter effect in streamlit_app.py #8