Closed Drakkadakka closed 1 week ago
sorry didn't upload my ss
also just typing this has done 50 sensitivity sets
oh also while using the discord bot version it will not output audio of the message it sends and if i send a blank message in the cmd it will output the TTS these last two are examples of the discord bot part and the TTS part responding
Ah yes, I see what the issues are. So, let me break them down:
First, the issues with the sensitivity and blank messages is actually the Hotkeys. I should have explained that further (and I will in future tutorials!), but that is what is causing that. To disable it, you can do a combo of the keys " ` " and " / " in quick order. It is like this so that you can use hotkeys, but also easily disable them to type by doing that combo. It should come up saying "Input System Lock Set To True!" once done. You can also enter into the .env file (edit it with notepad), and where it says "HOTKEYS_BOOT" change that from "ON" to "OFF", and they will be turned off by default.
So then! The second issue with the Discord messages not reading aloud can be fixed fairly easily in the UI. There is a setting that says "Speak Typed Chats / Shadow Chats", this will read all messages aloud (including ones that would normally be silent). However...
Your issues with the UI make it so that you don't really see that option at the moment. It's the second "Check/Uncheck" in the Settings tab. For your UI, I have no clue why it is erroring! Can you send your log.txt file in your main folder?
P.S. How did you get images in Discord working? I haven't even done that yet!
log.txt Hello thank you for your response, here is the log file.
tbh I have just been going back to your video about installing and looking at your UI while I was setting it up so I must've missed that option thanks for pointing it out, I look forward to more videos! I am not sure what is happening with the web UI errors hopefully you find the problem in the log file, again thanks for the help.
When I was getting it installed originally I had to create a dummy log file for the minecraft section for it to launch until I turned off the minecraft section.
All I did was connect the bot to the server I own, it has permissions higher than a bot in discord usually has, if you want to test it I can invite you to the app in dev portal (Have found a way to connect to voice): if you make it a normal account it will be able to connect to voice allowing you to chat on mobile)) however this is not allowed with TOS so I can not legally say you should try this.
What I suspect is happening is it understands the gif link as a link with descriptions, as that is what is printed on the CMD side, however it may already just have the capability to recognize the gif image, I will do more tests and let you know
However this image shows it doesn't understand links as a user posted their twitch
I got bored so decided to try and implement the image recognition not sure if it'll work but here's my shot at it
import time import colorama import humanize, os, threading import emoji import logging from logging.handlers import RotatingFileHandler
import utils.audio import utils.hotkeys import utils.transcriber_translate import win32com.client import utils.vtube_studio import utils.alarm import utils.volume_listener import utils.minecraft import utils.log_conversion
import API.Oogabooga_Api_Support
import utils.lorebook import utils.camera
import utils.z_waif_discord import utils.web_ui
import utils.settings
from dotenv import load_dotenv from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions from tensorflow.keras.preprocessing import image import numpy as np import tensorflow as tf
load_dotenv()
TT_CHOICE = os.environ.get("WHISPER_CHOICE") char_name = os.environ.get("CHAR_NAME")
stored_transcript = "Issue with message cycling!" undo_allowed = False
log_dir = "utils/logs" os.makedirs(log_dir, exist_ok=True) log_file = os.path.join(log_dir, "image_recognition.log")
log_handler = RotatingFileHandler(log_file, maxBytes=510241024, backupCount=3) # 5MB per log file, 3 backups logging.basicConfig( handlers=[log_handler], level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s' )
def load_model(): return ResNet50(weights='imagenet')
def recognize_image(model, img_path): img = image.load_img(img_path, target_size=(224, 224)) # Adjust image size for the model img_array = image.img_to_array(img) img_array = np.expand_dims(img_array, axis=0) img_array = preprocess_input(img_array)
predictions = model.predict(img_array)
labels = decode_predictions(predictions, top=3)[0]
# Log and print the recognition results
log_message = f"Recognized objects for {img_path}:\n"
for label in labels:
result = f"{label[1]}: {label[2] * 100:.2f}%"
log_message += result + "\n"
print(result)
logging.info(log_message)
return labels
def main_view_image(): print("\n\nViewing the camera! Please wait...\n")
# Clear camera inputs
utils.hotkeys.clear_camera_inputs()
# Capture image
if not utils.settings.cam_use_image_feed:
if not utils.settings.cam_image_preview:
utils.camera.capture_pic()
else:
break_cam_loop = False
while not break_cam_loop:
utils.hotkeys.clear_camera_inputs()
utils.camera.capture_pic()
while not (utils.hotkeys.VIEW_IMAGE_PRESSED or utils.hotkeys.CANCEL_IMAGE_PRESSED):
time.sleep(0.05)
if utils.hotkeys.VIEW_IMAGE_PRESSED:
break_cam_loop = True
utils.hotkeys.clear_camera_inputs()
else:
utils.camera.use_image_feed()
# Assume image is saved at utils/camera_output/image.jpg
img_path = "utils/camera_output/image.jpg"
if os.path.exists(img_path):
print(f"Image captured at: {img_path}")
# Load model and recognize the image
model = load_model()
recognized_labels = recognize_image(model, img_path)
else:
print("Error: Image capture failed!")
logging.error("Image capture failed - no image found at the expected path.")
# Check if direct talk mode is enabled
direct_talk_transcript = ""
if utils.settings.cam_direct_talk:
direct_talk_transcript = view_image_prompt_get()
# Process and display the transcript
transcript = API.Oogabooga_Api_Support.view_image(direct_talk_transcript)
print("\n" + transcript + "\n")
# Allow undo after the image is processed
global undo_allowed
undo_allowed = True
# Optional follow-up after viewing the image
if utils.settings.cam_reply_after:
view_image_after_chat(f"So, what did you think of the image, {char_name}?")
def run_program(): print("Welcome back! Loading chat interface...\n\n", end="", flush=True)
# Load hotkey ON/OFF on boot
utils.hotkeys.load_hotkey_bootstate()
# Load settings for various modules
minecraft_enabled_string = os.environ.get("MODULE_MINECRAFT")
utils.settings.minecraft_enabled = minecraft_enabled_string == "ON"
alarm_enabled_string = os.environ.get("MODULE_ALARM")
utils.settings.alarm_enabled = alarm_enabled_string == "ON"
vtube_enabled_string = os.environ.get("MODULE_VTUBE")
utils.settings.vtube_enabled = vtube_enabled_string == "ON"
discord_enabled_string = os.environ.get("MODULE_DISCORD")
utils.settings.discord_enabled = discord_enabled_string == "ON"
rag_enabled_string = os.environ.get("MODULE_RAG")
utils.settings.rag_enabled = rag_enabled_string == "ON"
vision_enabled_string = os.environ.get("MODULE_VISUAL")
utils.settings.vision_enabled = vision_enabled_string == "ON"
# Run any needed log conversions
utils.log_conversion.run_conversion()
# Load previous chat history
API.Oogabooga_Api_Support.check_load_past_chat()
# Start threads for different modules
if utils.settings.vtube_enabled:
vtube_studio_thread = threading.Thread(target=utils.vtube_studio.run_vtube_studio_connection)
vtube_studio_thread.daemon = True
vtube_studio_thread.start()
if utils.settings.alarm_enabled:
alarm_thread = threading.Thread(target=utils.alarm.alarm_loop)
alarm_thread.daemon = True
alarm_thread.start()
volume_listener = threading.Thread(target=utils.volume_listener.run_volume_listener)
volume_listener.daemon = True
volume_listener.start()
volume_listener_toggle = threading.Thread(target=utils.hotkeys.listener_timer)
volume_listener_toggle.daemon = True
volume_listener_toggle.start()
if utils.settings.minecraft_enabled:
minecraft_thread = threading.Thread(target=utils.minecraft.chat_check_loop)
minecraft_thread.daemon = True
minecraft_thread.start()
if utils.settings.discord_enabled:
discord_thread = threading.Thread(target=utils.z_waif_discord.run_z_waif_discord)
discord_thread.daemon = True
discord_thread.start()
gradio_thread = threading.Thread(target=utils.web_ui.launch_demo)
gradio_thread.daemon = True
gradio_thread.start()
# Start main loop
main()
if name == "main": current_directory = os.path.dirname(os.path.abspath(file))
# Create resource directories
resource_directory = os.path.join(current_directory, "utils", "resource")
os.makedirs(resource_directory, exist_ok=True)
voice_in_directory = os.path.join(resource_directory, "voice_in")
voice_out_directory = os.path.join(resource_directory, "voice_out")
os.makedirs(voice_in_directory, exist_ok=True)
os.makedirs(voice_out_directory, exist_ok=True)
run_program()
Key Changes: Log Rotation:
A RotatingFileHandler has been added to the logging system. It stores up to 5 MB of logs per file and retains 3 backup logs (image_recognition.log, image_recognition.log.1, image_recognition.log.2). This ensures that logs don't grow indefinitely, managing disk space efficiently. Image Recognition:
The main_view_image() function captures an image and passes it to the recognize_image() function, which uses the pre-trained ResNet50 model to recognize objects and logs the results. Logging Configuration:
Logs both successful recognition and errors (like failure to capture an image). This aids in debugging and tracking the history of image processing.
edit of the startup install that implements the above
@echo off setlocal
REM Get the current directory of the batch file set "SCRIPT_DIR=%~dp0"
REM Set the log file path set "LOG_FILE=%SCRIPT_DIR%\log.txt"
REM Change to the script directory cd /d "%SCRIPT_DIR%"
REM Remove old log file if it exists if exist "%LOG_FILE%" del "%LOG_FILE%"
REM Create and activate the main virtual environment python -m venv venv call venv\Scripts\activate
REM Upgrade pip to the latest version python -m pip install --upgrade pip 2>> "%LOG_FILE%"
REM Install PyTorch, torchvision, and torchaudio python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 2>> "%LOG_FILE%"
REM Install openai-whisper from the GitHub repository python -m pip install git+https://github.com/openai/whisper.git 2>> "%LOG_FILE%"
REM Install TensorFlow python -m pip install tensorflow 2>> "%LOG_FILE%"
REM Install additional libraries python -m pip install colorama humanize emoji python-dotenv opencv-python pywin32 2>> "%LOG_FILE%"
REM Install the remaining dependencies from requirements.txt if exist requirements.txt ( python -m pip install -r requirements.txt 2>> "%LOG_FILE%" ) else ( echo requirements.txt not found. Please ensure it is in the script directory. >> "%LOG_FILE%" )
REM Check if the main.py file exists if not exist main.py ( echo main.py not found. Please ensure it is in the script directory. >> "%LOG_FILE%" echo main.py not found. Please ensure it is in the script directory. pause >nul goto end )
REM Execute the Python script (replace "main.py" with the actual file name) python main.py 2>> "%LOG_FILE%"
REM Deactivate the virtual environment deactivate
:end REM Display message and prompt user to exit echo. echo Batch file execution completed. Press any key to exit. pause >nul
endlocal
Path Management: Ensures the script runs relative to its own directory.
Virtual Environment: Properly sets up and activates a Python virtual environment.
Dependency Management:
Error Handling: Logs errors and checks for the presence of requirements.txt
and main.py
.
Environment Consistency: Upgrades pip to avoid compatibility issues with outdated package management.
edit of the requirements
numpy~=1.24.4 requests~=2.31.0 python-dotenv~=1.0.0 colorama~=0.4.6 humanize~=4.7.0 emoji~=2.9.0
opencv-python sounddevice~=0.4.6 PyAudio~=0.2.14 pydub~=0.25.1
keyboard~=0.13.5 mouse~=0.7.1 PyGetWindow~=0.0.9
tensorflow
discord discord.py[voice] pyvts~=0.3.2
gradio~=3.24.1 fastapi==0.95.0 pydantic<2.0,>=1.10.2 starlette>=0.26.1,<0.27.0 uvicorn==0.22.0
PythMC~=1.2.2
pywin32
sorry if that's not allowed figured i'd try and help after you helped me
Haha, no you're good, if anything the Discord API is hard to work with, a workaround would be awesome!
If you download the new version, it should fix the UI issues. There was a conflict with Gradio and one of it's own dependencies. There are also a few other fixes. I would download and install in a new folder, and any edited scripts you have you can transfer over.
And yeah, it's just going off of the name of the link- you would need to load a Multimodal model for them to see. With the vision module (which you can turn ON, if you have a multimodal model loaded in Oobabooga) though, it is possible for them to see it. But it does still need linkage with Discord... I have no clue what ResNet50 is, but it seems curious - I will take a closer look for sure.
Let me know if this fixes your issues!
can confirm that it is working will tinker more latter and see if i come up with any issues
feel free to take anything from the code i posted and tinker. I have a ollama running by default on my pc that runs it the only linking I did to discord with the one i made was the discord token that RESNet50 is hard coded into the main py instead of using the api token very crudely I might add. I have not tested it just mocked it up so hope you have fun with it.
Hello sorry about this big one
get these problems with the web ui
every time i use my keyboard to type it changes the settings, will send messages and the setting full auto and sensitivity of the message and sometimes a blank message