Leon-Sander / Local-Multimodal-AI-Chat

GNU General Public License v3.0
136 stars 83 forks source link

Image Processing - User message is blank #6

Closed ra9hur closed 9 months ago

ra9hur commented 9 months ago

Excellent tutorial and thanks a lot for sharing the knowledge.

  if send_button or st.session_state.send_input:
      if uploaded_image:
          with st.spinner("Processing image..."):
              user_message = "Describe this image in detail please."
              if st.session_state.user_question != "":
                  user_message = st.session_state.user_question
                  st.session_state.user_question = ""
              llm_answer = handle_image(uploaded_image.getvalue(), st.session_state.user_question)  ---------->>>>
              chat_history.add_user_message(user_message)
              chat_history.add_ai_message(llm_answer)

Referring to the line indicated in the above code snippet, "user_message" should be passed as input to the handle_image function.

Currently, "st.session_state.user_question" is passed and carries a null string (in both cases). The model accepts null string and does not throw an error however. It is just that it does not consider message entered in the prompt.

Leon-Sander commented 9 months ago

I just checked it, the error occurs when pressing the send button. It triggers a strange mechanism where the code runs two times. First time it considers the user input, but sets it to an empty string then. But before the llm processes, the script is rerun and runs the llm again, but this time the string is empty. I am going to remove the button, seems to complicate things and I dont 100% understand the mechanism in this context.

ra9hur commented 9 months ago

hmm, strange. It has been working perfectly for me and very much predictable. I am coding while viewing your video, yet to incorporate audio. Not much of clarity on how this works and need to do some background study. Currently working on PDF chat and retriever.

Including the code for your reference.

app.txt

Leon-Sander commented 9 months ago

You can verify for yourself if pressing the button also runs the script twice by adding some print statements, for example. You can, of course, try to gain a better understanding of the button mechanism and fix the issue or create a workaround for a deeper learning experience, which is the ultimate goal.

To be honest, I didn't really use the button myself; pressing 'Enter' was more convenient for me. I have just updated the code by removing the button. There are also a few additional changes, which are detailed in the readme's changelog.

Thank you for pointing it out, and it's great that you've made it this far.

Leon-Sander commented 9 months ago

I kinda overlooked the obvious line you mentioned here. I am going to update the code right now with inputing user_message, thank you for noticing and mentioning that.

llm_answer = handle_image(uploaded_image.getvalue(), st.session_state.user_question) to llm_answer = handle_image(uploaded_image.getvalue(), user_message)

SJ-1407 commented 5 months ago

import streamlit as st from chains import load_normal_chain from langchain.memory import StreamlitChatMessageHistory

from langchain_community.chat_message_histories import (

StreamlitChatMessageHistory,

)

from utils import save_chat_history_json, get_timestamp,load_chat_history_json,load_config from streamlit_mic_recorder import mic_recorder import os import yaml

from audio_handler import transcribe_audio,convert_to_wav,check_ffmpeg from image_handler import handle_image import chromadb from langchain_chroma import Chroma from pdf_handler import add_documents_to_db from chains import load_pdf_chat_chain from url_handler import get_url_data from database_operations import load_last_k_text_messages, save_text_message, save_image_message, save_audio_message, load_messages, get_all_chat_history_ids, delete_chat_history import sqlite3 from html_templates import css, get_avatar

with open("config.yaml", "r") as ymlfile:

#config = yaml.safe_load(ymlfile)

config = load_config()

@st.cache_resource def chain_loading(): #earlier using chat_history as paramaeter if(st.session_state.pdf_chat): print("loading pdf chat chain") return load_pdf_chat_chain() #earlier using chat_history as paramaeter return load_normal_chain(chat_history) return load_normal_chain() #earlier using chat_history as paramaeter

def clear_previous_input():

#st.session_state.user_question=st.session_state.user_input
#st.session_state.user_input=""

def set_input():

#st.session_state.send_input=True
#clear_previous_input()

def save_chat_hsitory():

def get_session_key(): if st.session_state.session_key == "new_session": # New function added st.session_state.new_session_key = get_timestamp() return st.session_state.new_session_key return st.session_state.session_key

def delete_chat_session_history(): # New function added delete_chat_history(st.session_state.session_key) st.session_state.session_index_tracker = "new_session"

def set_index_tracker(): st.session_state.session_index_tracker=st.session_state.session_key

def toggle_pdf_chat(): st.session_state.pdf_chat=True

def toggle_url_chat(): st.session_state.url_chat=True

def clear_cache(): # New function added st.cache_resource.clear()

def main(): st.title("Luminary AI") st.write(css, unsafe_allow_html=True)

#st.sidebar.title("Chat Sessions")
#chat_sessions=["new_session"]+ os.listdir(config["chat_sessions_path"])

#if "send_input" not in st.session_state:
if "db_conn" not in st.session_state:
    st.session_state.session_key="new_session"
    #st.session_state.send_input=False
    #st.session_state.user_question=""
    st.session_state.new_session_key=None
    st.session_state.session_index_tracker="new_session"
    st.session_state.pdf_uploader_key = 1
    st.session_state.db_conn = sqlite3.connect(config["chat_sessions_database_path"], check_same_thread=False)
    st.session_state.audio_uploader_key = 0  # New line added
if st.session_state.session_key=="new_session" and st.session_state.new_session_key!=None:
    st.session_state.session_index_tracker=st.session_state.new_session_key
    st.session_state.new_session_key=None

st.sidebar.title("Chat Sessions")  # New line added
chat_sessions = ["new_session"] + get_all_chat_history_ids() # New line  added

index=chat_sessions.index(st.session_state.session_index_tracker)
st.sidebar.selectbox("Select Chat Session",chat_sessions,key="session_key",index=index) #,on_change=set_index_tracker
#st.sidebar.toggle("Chat PDF",key="pdf_chat",value=False)
#st.sidebar.toggle("Chat URL",key="url_chat",value=False)

#if(st.session_state.session_key!="new_session"):

#st.sidebar.button("Delete Chat Session", on_click=delete_chat_session_history) #new line added
#st.sidebar.button("Clear Cache", on_click=clear_cache) #new line added

#print(chat_history.messages)
#llm_chain=chain_loading(chat_history)

#user_input=st.text_input("Message Luminary AI",key="user_input",on_change=set_input)
#voice_recording_column,send_button_column=st.columns([1,1])
pdf_toggle_col, voice_recording_column = st.sidebar.columns(2)
pdf_toggle_col.toggle("PDF Chat", key="pdf_chat", value=False)
with voice_recording_column:
    audio_data = mic_recorder(
            start_prompt="Start recording",
            stop_prompt="Stop recording",
            just_once=True,

    )
delete_chat_col, clear_cache_col = st.sidebar.columns(2)
delete_chat_col.button("Delete Chat Session", on_click=delete_chat_session_history)
clear_cache_col.button("Clear Cache", on_click=clear_cache)
chat=st.container()
user_input = st.chat_input("Type your message here", key="user_input")
# Your Streamlit file uploader and processing logic
#uploaded_audio = st.sidebar.file_uploader("Upload Audio File", type=["wav", "mp3", "m4a", "flac", "ogg", "opus"])
uploaded_audio = st.sidebar.file_uploader("Upload an audio file", type=["wav", "mp3", "ogg"], key=st.session_state.audio_uploader_key)
uploaded_image = st.sidebar.file_uploader("Upload Image File", type=["jpg", "jpeg", "png"])
#uploaded_pdf = st.sidebar.file_uploader("Upload PDF File", type=["pdf"],accept_multiple_files=True,on_change=toggle_pdf_chat)
uploaded_pdf = st.sidebar.file_uploader("Upload a pdf file", accept_multiple_files=True, key=st.session_state.pdf_uploader_key, type=["pdf"], on_change=toggle_pdf_chat)
uploaded_url=st.sidebar.text_input("Enter URL",on_change=toggle_url_chat)

if uploaded_pdf:
    print("PDF uploaded")
    with st.spinner("Processing PDF ..."):
        add_documents_to_db([uploaded_pdf])
        st.session_state.pdf_uploader_key += 2

if uploaded_audio:
    print("Audio uploaded")
    transcribed_audio = transcribe_audio(uploaded_audio.getvalue())
    print(transcribed_audio)
    #llm_chain =chain_loading(chat_history)
    #llm_chain.run("Summarize this text: " + transcribed_audio,chat_history)

    llm_chain = chain_loading()  #new line added
    llm_answer = llm_chain.run(user_input = "Summarize this text: " + transcribed_audio, chat_history=[]) #new line added
    save_audio_message(get_session_key(), "human", uploaded_audio.getvalue()) #new line added
    save_text_message(get_session_key(), "ai", llm_answer) #new line added
    st.session_state.audio_uploader_key += 2 #new line added

    #audio_transcription=transcribe_audio(uploaded_audio.getvalue())
    #llm_chain.run("Summarize the text:" + transcribe_audio,chat_history)

#new lines added
if audio_data:
    print("Audio recorded")
    transcribed_audio = transcribe_audio(audio_data["bytes"])
    print(transcribed_audio)
    llm_chain = chain_loading()
    llm_answer = llm_chain.run(user_input = transcribed_audio, chat_history=load_last_k_text_messages(get_session_key(), 4))
    save_audio_message(get_session_key(), "human", audio_data["bytes"])
    save_text_message(get_session_key(), "ai", llm_answer)
#new lines done above 

if (uploaded_url):
  print("URL uploaded")
  with st.spinner("Processing url ..."):
    get_url_data(uploaded_url)

#with send_button_column:
    #send_button =st.button("Send",key="send_button",on_click=clear_previous_input)

#if send_button or st.session_state.send_input==True:
if user_input:
    print("User input:")
    if uploaded_image:

        print("Image uploaded")
        with st.spinner("Processing image..."):
            #user_message="Describe this image in detail please."
            #new lines added below
            #llm_answer = handle_image(uploaded_image.getvalue(), st.session_state.user_question)
            #save_text_message(get_session_key(), "human", st.session_state.user_question)
            llm_answer = handle_image(uploaded_image.getvalue(), user_input)
            save_text_message(get_session_key(), "human", user_input)
            save_image_message(get_session_key(), "human", uploaded_image.getvalue())
            save_text_message(get_session_key(), "ai", llm_answer)
            #st.session_state.user_question = ""
            user_input=None
            #new lines added above

    #if st.session_state.user_question!="":
    if user_input:
        print("User input:")

        #llm_response= llm_chain.run(st.session_state.user_question,chat_history) 
        #new lines added below
        llm_chain = chain_loading()
        llm_answer = llm_chain.run(user_input = user_input, chat_history=load_last_k_text_messages(get_session_key(), 4))
        save_text_message(get_session_key(), "human", user_input)
        save_text_message(get_session_key(), "ai", llm_answer)
        #new lines added above

        #st.session_state.user_question=""\
        user_input=None

#if chat_history.messages!=[]:
if (st.session_state.session_key != "new_session") != (st.session_state.new_session_key != None):  #new line added          
    with chat:
        #st.write("Chat History:")
        chat_history_messages = load_messages(get_session_key())

        #for message in reversed(chat_history_messages):
            #st.write(get_media_template(message["content"], message["message_type"], message["sender_type"]), unsafe_allow_html=True)   
             #st.chat_message(message["message_type"]).write(message["content"])
        for message in chat_history_messages:
            with st.chat_message(name=message["sender_type"], avatar=get_avatar(message["sender_type"])):
                if message["message_type"] == "text":
                    st.write(message["content"])
                if message["message_type"] == "image":
                    st.image(message["content"])
                if message["message_type"] == "audio":
                    st.audio(message["content"], format="audio/wav")

    if (st.session_state.session_key == "new_session") and (st.session_state.new_session_key != None):
        st.rerun()

if name== "main": main()

I am facing a bug in image upload , the image is being processed only when I type a message , also I tried to fix it by removing the if user_input condition before if uploaded_image , but then the image is not being processed at all. Could you please help me in fixing this bug?

Leon-Sander commented 5 months ago

@SJ-1407 this is not a bug, you need to input a question so the models knows what it should do with the image, what is the purpose of processing it. What is your intention with the image processing?