chickendude / Natibo

Breathe new life into your Glossika PDF/MP3 courses!
The Unlicense
26 stars 3 forks source link

link to tool #7

Closed jubalh closed 6 years ago

jubalh commented 6 years ago

Issue states: GLS packs (another tool i've been working on to split Glossika GMS/PDF files into individual sentences). Which tools it that, is there a link to it?

jubalh commented 6 years ago

Probably it's the webapp at https://ralena.ch/ ? I would be interested to see the sources for that too to learn how you extracted the sentences from the PDF and mp3s.

chickendude commented 6 years ago

The tool is available at https://github.com/chickendude/GlossikaNativeGLS. For the mp3s it's no magic, just using audacity to get the times for each sentence then using a Python mp3 library to split the GMS B files.

The PDFs are a bit more automated, i just put the first/last page of the GMS sentences and use pdftotext, unfortunately it's Linux only which is why i initially put the web version up.

I also found this which is a bit more automated and i might look into doing something similar to that (i believe they use the same mp3 library i'm using anyway). To be honest i was more focused on the app than the other tools so just wanted something that worked.

EDIT: Ah, nevermind, seems that that isn't for splitting the Glossika files but rather for joining a set of separate mp3 sentences. I just wrote a quick script to use Pydub's "split_on_silence" function, the silence search method is much slower than Audacity's (at first i thought it was broken), but it works and doesn't need user input, though you should . It also works with GMS B and C files. I'm going to test it a bit more and i'll push it to the GlossikaNativeGLS repo.

EDIT2: Ok, it is now online as well. A bit slow, but you can just put all the files in the "files" directory and it'll separate them into language/book/sentence with no other user interaction.

chickendude commented 6 years ago

I've added a link to the other toolset in the readme :)