0ssamaak0 / CLIPPyX

AI Powered Image search tool offers content-based, text, and visual similarity system-wide search.
MIT License
109 stars 9 forks source link

Support Linux & macOS and start the frontend with server.py #1

Open t0saki opened 3 weeks ago

t0saki commented 3 weeks ago

Thank you very much for your work! I have always wanted to use a more powerful model to replace the built-in AI search in Synology Photos to help me index the pictures on my NAS. This project is very close to my needs.

To get this project running on my NAS, I modified the indexing part of the code to use a less efficient but more general scanning method and stored the file list in an SQLite DB.

Additionally, I added an interface for the frontend index.html in server.py. Now you can access the search web page via /index after starting server.py.

Since I won't be running this project on Windows, I completely removed support for Everything. If you think my modifications can be merged into the mainline, I can restore the indexing logic for the Windows platform and submit a PR so as not to degrade performance on Windows.

If I have the energy later, I might create a Dockerfile (although I am not very proficient at it) and support monitoring file system changes to scan new files in real-time and improve indexing performance. If you already have related plans, please let me know.

I don't have much development experience in this area, so if any of my implementations seem ridiculous, please point them out.

Thank you :)

My development branch: https://github.com/0ssamaak0/CLIPPyX/compare/main...t0saki:CLIPPyX:support-linux?expand=1

0ssamaak0 commented 3 weeks ago

Excellent work! and too many ideas 😁😁 I'm really happy you're interested

1. CLIPPyX comand is entry point for main.py

# main.py
import subprocess
import yaml

# Load the configuration file
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f)

subprocess.run(["python", "Index/everything_images.py"])
if config["server_os"] == "wsl":
    print("Running in WSL")
    subprocess.check_call(["wsl", "-e", "bash", "server_wsl.sh"])
elif config["server_os"] == "windows":
    print("Running in Windows")
    subprocess.run(["python", "server.py"])

Instead of removing Everything indexing, we can add your updates as option for Unix bases systems. and from config.yaml you setup yoru server (I may create a GUI for it too)

2. Why using sqlite to store file names?

I occam's razor-ed it an I thought a simple text file updated each time is good option maybe because Everything indexing takes almost 0 time but if there's a reason please tell me

3. You're creating an index for single directory only

From What I've seen you're creating an index for single directory only (not all images on your OS) which is good functionality if you're interested in single dir (ofc I'm planning to add this option later) but what I mean, you didn't find an alternative for everything to index all images on your disk

4. Adding the frontend to flask app is a good idea

can you make a separate PR for it? I Will merge it immediately.

5. Docker

I'm not expert too, but I'm planning to add this as soon as I add decent support for Unix machines

6. Monitoring

For the monitoring thing I leave everything to Everything (lol) it indexes my files in background and once I run everything_images.py I get the updated list. We need to search for similar alternative on Unix. and I'm sure we will find similar or at least closer option

Thank you for your amazing ideas

t0saki commented 3 weeks ago

I have submitted a PR to provide a WebUI in Flask https://github.com/0ssamaak0/CLIPPyX/pull/9.

Using SQLite to store the file list is aimed at maintaining high performance when regenerating the list, especially if the file index is very large. Writing out a huge txt file each time could be very time-consuming (although this may not be observed currently). However, this might need to be implemented together with file change monitoring, so as you mentioned, the current implementation might not be necessary.

Using a single directory is intended to eventually distribute this project as a Docker image to run on servers and other devices — in such a case, a directory from the host machine could be bound to the container, and the service would scan this directory. However, this goes against the current purpose of using Everything to index Windows files, so this is limited to my own needs.

I really like the project you provided! I hope this project can become even more perfect.