Open GrimPixel opened 1 year ago
Hi and sorry for the wait. This looks like a great resource thanks! I really underestimated how big of a task segmenting words would be
I was hopping what I suggested in #5 would suffice. But now the better approach seems to be to completely rework the way words are read by Spedread
Maybe something like:
when start_reading_button.pressed:
chunks = user_text.split_by_language()
for language, text_chunk in chunks:
if language.requires_word_segmentation:
words = language.get_nlp_library().parse(text_chunk)
else:
words = text_chunk.split_by_spaces()
What do you think?
I'll also ask the opinion of one of my colleague who does NLP stuff next week to see if that's reasonable
Great to hear that! I think users can choose their own word segmentation engine. Just place engines in a folder and program a file that calls the engine to segregate the sentences.
Good idea! If I end up going with that idea I'll see what would be the best format for these libraries later (maybe .so/.wasm or Python scripts idk)
There are other languages than Japanese that need word segmentation https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/Text-Processing-Tools#Word_Segmentation