chickendude / Natibo

Breathe new life into your Glossika PDF/MP3 courses!
The Unlicense
26 stars 3 forks source link

Failure to import a .gls file #16

Closed alastairdb closed 5 years ago

alastairdb commented 6 years ago

Is the any description of the format of a .gls file?

I tried to guess the format and to import my test files into v0.2.4-beta on my phone (an Honor 6X running Android 7.0) but I failed. In the input dialogue, I could see my files with a .gls extension in my Download directory but they were not selectable. Where did I go wrong?

jubalh commented 6 years ago

AFAIK the README mentions the tool which creates the .gls file: This is an app which can load GLS packs ([another tool](https://github.com/chickendude/GlossikaNativeGLS) i've been working on to split Glossika GMS/PDF files into individual sentences) and use the old Glossika mp3 + PDF courses similar to the new AI platform.

Here it is: https://github.com/chickendude/GlossikaNativeGLS

alastairdb commented 6 years ago

That tool does not quite work for me. With a bit of tweaking, I managed to split the mp3's correctly. As for the pdf's, I have managed to extract the text via OCR and I tried to mimic the format of the tool. If I could get the Android app to at least try to parse my .gls files, I would probably manage to debug the format fully

jubalh commented 6 years ago

@alastairdb which course do you try to parse?

jubalh commented 5 years ago

@alastairdb it's just a zip file containig a gsp file which is all the texts and then all the seperate mp3 files.

chickendude commented 5 years ago

@alastairdb Sorry, i just saw your message. I've been working on a big update to support triangulation packages which is mostly finished and will be merging onto the master branch probably this week or next.

Originally, i had two languages put into one GLS pack, but i later changed that to just have one language per pack as it avoided the need to resend both languages to the app each time i changed something or if i wanted to add a new target language from the same base language. You can still have two languages in one pack (for now) but i'll probably remove support for that eventually, so it's probably best to just do one language per pack.

As for the actual .gls pack itself, it is indeed a renamed .zip file. I've attached a sample for Shanghainese that i've made/been using to study Shanghainese: SHA-F1.zip (Note: you'll have to change the extension to .gls, Github won't let me upload a .gls file directly.)

As for the structure, it's pretty simple, inside you have a .gsp file which is just a tab-separated CSV file. The first line should list what sentence parts you are adding, currently accepted values are "index", "sentence", "IPA", and "romanization". At the very least, you should include "index" and "sentence", the other two are optional. The index is the order of the sentences, so F1 has indices 1-1000, F2 has 1001-2000, etc. The name of this .gsp is important: LANGNAME-4_DIGIT_FIRST_INDEX-4_DIGIT_LAST_INDEX.gsp, e.g. "SHA-0001-1000.gsp", "EN-1001-1500.gsp", etc. I trim the spaces out if you add any in, but it's probably best to leave it without any spaces. Also make sure your language has been added here and that you're using the same 2-3 letter name.

The only other thing in the .gls file is the audio for each sentence. This looks like this: DE - F1 - 0001.mp3 DE - F1 - 0002.mp3 etc.

You can import all the text and gradually add the audio as you go recording it or splitting it, but for now it'll crash if it tries to play a sentence without any audio. In the Shanghainese course, i've recorded and split the first 550 sentences but added the text for the first 1000 sentences, so trying to play sentence 551 will cause it to crash. It'll work for all sentences up to that point, though.

Sorry it took so long to reply, if you or anyone else has any other questions just let me know!

alastairdb commented 5 years ago

Thanks a lot for your extensive reply. I actually have an old Chinese course and the text in the PDFs is not immediately accessible. I used your audio splitter and it worked a treat! The text I had to drag out semi-automatically with Tesseract. For the time being, I have the course converted to Anki files. When I get time, I'll try your phone app again

chickendude commented 5 years ago

Ah yeah if you're using the older PDFs (that you couldn't copy/paste from) it's not going to work, it doesn't do OCR it just pulls the text from the PDF itself. You'd need to use the latest PDF they released for it to work. If you need help getting it set up, feel free to get in touch.

afdlt commented 5 years ago

Hey chickendude, thank you so much for creating this app and sharing it with everyone. Im struggling to figure out how to create the gls packs for my mandarin taiwan course. Do you or anyone else have mandarin gls packs already made that they wouldnt mind sharing with me? im just a dude thats not very technologically gifted (to say the least lol). If not, is there a simpler way to do it? Anything helps. Thank you!