earth-metabolome-initiative / emi-monorepo

Monorepo for the Earth Metabolome Initiative
GNU General Public License v3.0
5 stars 0 forks source link

Image checks #8

Open LucaCappelletti94 opened 1 month ago

LucaCappelletti94 commented 1 month ago

We need to add the following checks to the images being uploaded:

What other things should we take into account?

oolonek commented 1 month ago

Some starters

oolonek commented 1 month ago

In case we would like to retrieve text from images (e.g. for nameplates) we could

LucaCappelletti94 commented 1 month ago

Only consider software that can be compiled to wasm. This means in most cases no bindings.

oolonek commented 1 month ago

https://github.com/robertknight/ocrs

ocrs is a Rust library and CLI tool for extracting text from images, also known as OCR (Optical Character Recognition).

The goal is to create a modern OCR engine that:

Works well on a wide variety of images (scanned documents, photos containing text, screenshots etc.) with zero or much less preprocessing effort compared to earlier engines like Tesseract. This is achieved by using machine learning more extensively in the pipeline. Is easy to compile and run across a variety of platforms, including WebAssembly Is trained on open and liberally licensed datasets Has a codebase that is easy to understand and modify Under the hood, the library uses neural network models trained in PyTorch, which are then exported to ONNX and executed using the RTen engine. See the models section for more details.