emesterhazy / glossika-to-anki

Convert Glossika PDFs and audio files into Anki decks
MIT License
32 stars 8 forks source link

If I scan the text with an OCR tool on one of the old Glossika courses, will it work? #12

Closed jllking closed 4 years ago

emesterhazy commented 4 years ago

It won't work out of the box, but the existing code should provide a good template for making the necessary changes to support the v1 PDFs. If you'd like to add that support please feel free to open a pull request :)

jllking commented 4 years ago

Oh, I see. I'm pretty new to programming, so I'm not sure if I'm gonna be able to pull it off. Wish me luck! :)

emesterhazy commented 4 years ago

Take a look at how the current implementation works. You should start by OCR'ing the PDFs and converting them to text files, which the existing code does. After that, look for patterns that indicate the beginning of the sentences in each language. If you look at the existing source code you'll see how that's done for the v2 PDFs (you'll need regex). The process should be pretty similar for the v1 PDFs as well. Feel free to comment here if you get stuck.

emesterhazy commented 4 years ago

Closing this issue for now. Feel free to reopen it if you still need help.