MichielvanBeers / notion-auto-ocr

48 stars 5 forks source link

Is this project still active? or do alternatives outshine it? #3

Open JackCasfren opened 1 month ago

JackCasfren commented 1 month ago

Hi Michiel,

I came across your project, and it really caught my interest! I’m considering using it for a personal project to help digitize multiple recipe books. The idea is for my aunt to log into Notion, take a picture, and have it automatically uploaded to the Notion database.

Before committing to the setup, I was curious if the project is still actively maintained or used. How reliable has it been in your experience? Were there any major issues or limitations I should be aware of?

I’ve also considered using Evernote for its integrated OCR, but it seems less fun compared to working with your project, where I could apply and improve my Python skills.

I’d really appreciate any insights you can share, and I’m open to contributing if the project is still active.

Thanks for your time, and I look forward to hearing from you!

Best regards,

-Jack Casfren

MichielvanBeers commented 1 month ago

Hi @JackCasfren,

Thank you for your interest in the project! Let me answer your questions:

  1. Before committing to the setup, I was curious if the project is still actively maintained or used.

    • I still have it running in a Docker container, but have to be honest that I don't often use the actual functionality. However, it should just still work.
  2. How reliable has it been in your experience?

    • As soon as you have it up and running, I didn't experience any issue with it. The formatting of the text can sometimes be different then you would like of course, but that is something that can easily be fixed.
  3. Were there any major issues or limitations I should be aware of?

    • The main limitation in my opinion is that it works through polling and therefore it will take a couple of minutes before the OCR text is returned. But that is mainly me being impatient ;). Other than that, the Microsoft OCR API is pretty good!

I’d really appreciate any insights you can share, and I’m open to contributing if the project is still active.

  • That would be great! Happy to accept any Pull Requests. Do note that the code is definitely not structured according to best practices yet ;)
JackCasfren commented 1 month ago

Great Hearing from you Michiel! Sounds like it will work a treat.

Had a couple follow up questions:

Where do you host your docker container? I'm considering Google's free VPS, but maybe 1gb ram and 1 GB of outbound data transfer is not enough?

The other option I was thinking, is hosting it on a old laptop with Portainer or something similar.

Have you considered using self hosted OCR solutions? like Pytesseract. Maybe this could improve performance/response time. "Minutes" is a bit to much I think, I'm also impatient :P.

Thanks for your response :D

MichielvanBeers commented 1 month ago

I currently host it myself on an Ubuntu Server VM at home. 1GB Ram and 1GB data should be more than enough. The images never pass through the docker container itself, so hardly any data is used.

The response time is very fast (eg max 1-2 seconds) but the delay is caused by the fact that the container checks every X minutes if a new page has been added with an image. Notion didn't have (at the time of building) a way to receive an event/request when a new page has been added, so the only solution is to poll it. You could change the code to check more often, but you might run into some rate limiters.

I haven't checked Pytesseract yet, but will do so. Good luck!