Closed WannabeNihonjin closed 6 years ago
I found someone who had used this before and he got the audio files to split no problem(still dont know what I did wrong)but now the genanki file doesnt do anything when I have both the split files and the txt doctuments in the output folder. Have no clue what to do.
So I am the person who has been helping @WannabeNihonjin
The Glosssika PDFs throw an error during the extraction step. Here is a sample output on my setup:
$ python glossika_extract_pdf.py
Processing 3 files...
Processing GLOSSIKA-ENJA-F1-EBK.pdf...error
Something went wrong...found 0 sentences instead of 1,000
Processing GLOSSIKA-ENJA-F2-EBK.pdf...error
Something went wrong...found 0 sentences instead of 1,000
Processing GLOSSIKA-ENJA-F3-EBK.pdf...error
Something went wrong...found 0 sentences instead of 1,000
PDF extract complete!
It looks like it needs a similar fix to Cantonese that you made earlier.
Thanks @deepakjois. The script identifies the beginning of each sentence by looking for a character or set or characters that indicate the start of a phrase.
Based on the information you shared, I think the "日" character on this line should be changed to "JA".
'JA': ['EN', '日', 'ROM'] # Japanese (before)
'JA': ['EN', 'JA', 'ROM'] # Japanese (after)
Can you test this change and let me know if it works? I'll investigate whether some versions of the PDFs use 日 instead of JA before pushing a fix to the repo.
I made that change in the script here, and it seems to work:
$ python glossika_extract_pdf.py
Processing 3 files...
Processing GLOSSIKA-ENJA-F1-EBK.pdf...complete
Processing GLOSSIKA-ENJA-F2-EBK.pdf...complete
Processing GLOSSIKA-ENJA-F3-EBK.pdf...complete
PDF extract complete!
@WannabeNihonjin please check your email for Anki deck.
Thank you both for all the help, the deck works perfectly.
This is fixed now in the commit above.
I really have no clue what I'm doing wrong, I've been working on this for 2+ hours. I have the mp3split folder in the python folder and when i use the glossika_split_audio all that happens is python pops up for half a second and then goes away.
Please help