Ekhorn / spaced

Make space for your thoughts :)
MIT License
0 stars 0 forks source link

Image to text recognition #30

Open Ekhorn opened 7 months ago

Ekhorn commented 7 months ago

Description

Spaced would benefit from a image to text recognition feature, to select any part of the screen and collect the text present.

The model can be loaded on the backend and called to inference by calling a Tauri command. Taking a screenshot from any part of the screen could be part of Spaced, but often screenshot tools suffice and just pasting from the clipboard is good enough.

Requirements

Ekhorn commented 7 months ago

tch crate can be used, see examples: https://github.com/LaurentMazare/tch-rs/tree/main/examples

The mnist example seems like what is needed here.

Ekhorn commented 7 months ago

The image crate may also be useful here.

Ekhorn commented 7 months ago

tch crate can be used, see examples: https://github.com/LaurentMazare/tch-rs/tree/main/examples

The mnist example seems like what is needed here.

Well the mnist example contains training code not basic inference code, see https://github.com/LaurentMazare/tch-rs/blob/main/examples/pretrained-models/main.rs instead.

Ekhorn commented 7 months ago

pre-trained models can be found here https://github.com/onnx/models/tree/main/validated/vision/classification/mnist

Otherwise https://huggingface.co/models?sort=trending&search=mnist might also have something interesting.

Ekhorn commented 7 months ago

This https://github.com/huggingface/candle/blob/main/candle-examples/examples/onnx/main.rs may be simpler to use for the moment to just inference.

Ekhorn commented 7 months ago

After trying to work with candle, it seems that the library is not as flexible.

On https://www.arewelearningyet.com/neural-networks/ there is some more listed libraries.

Now there is also Burn which actually has a simple example on how to inference with an mnist.onnx model https://github.com/tracel-ai/burn/blob/main/examples/onnx-inference/build.rs.

Another interesting one could be: https://github.com/sonos/tract

To inference through WGPU https://github.com/webonnx/wonnx might be interesting.

Ekhorn commented 7 months ago

The term to use is OCR