TandoorRecipes / recipes

Application for managing recipes, planning meals, building shopping lists and much much more!
https://docs.tandoor.dev
Other
5.31k stars 555 forks source link

Fulltext search of an OCR'ed PDF #2543

Open 0x6d686b opened 1 year ago

0x6d686b commented 1 year ago

Is your feature request related to a problem? Please describe.

  1. Take a picture of a recipe in a magazine with e.g. Microsoft Office Lens and create an OCR'ed PDF.
  2. Put the PDF in the externalfiles folder and get Tandoor to ingest it.
  3. The PDF shows up, I can open it and also "grep" through the site via Ctrl+F in the browser.
  4. Fulltext search in the frontend doesn't return the wished recipe/PDF when using any in-text terms.

Describe the solution you'd like

It would be considerable more user friendly to be able to have full text search in the already OCR'ed PDFs that are imported via externalfiles. I do understand that by this way it isn't possible to automatically fill out the fields of the ingredients etc. However, I consider it to be a significant improvement.

E..g.

  1. Searching for "Ananas"
  2. Results also returns "Pina Colada.pdf" because Ananas is contained in the text.

Though I am not an experienced python coder, possibly you could use the pdftotext package to extract the text. This step would be done in the import part I would then suggest that the content is added to the "instruction" field in the database. I think this way requires the least modification to the code base and still allows to achieve the goal.

I think, it would just require to add a few lines to cookbook/provider/local.py#L21 like:

import pdftotext
[...]

                name = os.path.splitext(file)[0]
                with open(name, "rb") as f:
                    pdf = pdftotext.PDF(f)
                new_recipe = RecipeImport(
                    name=name,
                    file_path=path,
                    storage=monitor.storage,
                    space=monitor.space,
                )
                step = Step.objects.create(
                    instruction=instruction, space=monitor.space,
                )
                new_recipe.steps.add(step)
                new_recipe.save()

(I don't know if this would actually work, it's just from looking at the code and searching the repo!)

I think this should be doable with a very small amount of time and work but would massively improve the application.

Describe alternatives you've considered

No response

Additional context

No response

vabene1111 commented 1 year ago

thanks for the feedback. While this is definitely not "a small amount of time" to implement I do understand what you need and why it would be useful

I will leave this open for future planning but dont expect this to be a priority as external recipes are used by only a small fraction of users and thus dont get that much attention.

0x6d686b commented 1 year ago

Oh ok, I thought this is just a few lines of code needed. Sorry, I didn't want to make your work sound bad, I am apparently very much mistaken about the amount of time needed. My bad!

vabene1111 commented 1 year ago

no worries, just wanted to give a reason why this will likely not be quickly implemented. will keep it on the agenda, what you are definitely correct with is that this would be a decent middle ground for the current implementation