CopperEagle / SmartFileLibrary

SmartFileLibrary is an AI-supported digital library, backed by a local database on PostgreSQL and an experimental web interface.
MIT License
2 stars 1 forks source link

[Feature request] Custom models #1

Open GHuserE opened 4 months ago

GHuserE commented 4 months ago

Hello there!

Is it possible to add a feature such that one can add custom models for feature extraction? The donut model is not that accurate and I do not have the hardware for Idefics2. I have 8GB RAM, no GPU.

CopperEagle commented 4 months ago

Hey @GHuserE

Yes, this should be no problem.

A custom method for document metadata extraction (excluding keywords) can be implemented by extending the BaseDocumentClass. This extended class does the loading of the model and provides get_title, get_publisher and get_publishing_year. For the latter methods, you can assume self.image contains the image for the first page of the document. A good example is here.

However, I still need to add a method to DatabaseInterface that allows you to register this new class from outside the library.

If you have any specific model in mind, you can also send me a message. If the model provides good performance for the resources it needs, I can add it to the library.

GHuserE commented 4 months ago

I want to use Moondream2. It's friggin' good and can run locally and is fast. Thanks!

CopperEagle commented 4 months ago

Hey there

Thank you for the tip. I already have the model on my radar. It is indeed quite remarkable and also steerable enough. There are some other small models that write long elaborations on how they derive the title rather than just returning the answer - no matter how much you ask them not to (they like their flow, don't they) :laughing:. It may take some time before I add it, though... Please stand by :smiley:

CopperEagle commented 4 months ago

Okay @GHuserE the Moondream2 model is now available since a87135b. You can use it by setting

db.set_metadata_method(MOONDREAM2)

About adding custom models: I am going to work on reorganizing the code, e.g. by storing all prompts in a single place. This should help it being able to scale better with supporting more models and also remote APIs (ChatGPT, etc.). Please stand by.