Open hugolpz opened 1 year ago
@sih4sing5hong5 hello,
If you have less that 10,000 Amis words, a faster but less scalable way would be to just record the list of Amis headwords existing in your dictionary. Wikimedia's Lingualibre allows to record 800 audio words/hour.
[EDIT]: it appears there are usable resources, easy to process with regex, to get list of headwords in most (all?) Taiwanese minority languages. See https://e-dictionary.ilrdf.org.tw . Fair use / academic purpose reuse can be claimed.
What most interest me there, is
This discussion is just exploratory to map available resources and assess feasibility.
Thanks for recommendation :+1:
Using written corpus to record from Amis wikipedia is a new idea for me. It maybe work :)
I don't what is the most important for the users. If the audio for words is more important than the audio for sentences, just record the list. Or the audio for sentences is more important than word's, It maybe need text-to-speech within limited resources.
@miaoski and @wildjcrt , do you have feedback from Facebook or GA?
( @sih4sing5hong5 see edit above. )
Do note that Amis has 30+ hours of recorded audio that Facebook put to use in their Massively Multilingual Speech project. You can try out the Amis text to speech at https://huggingface.co/spaces/mms-meta/MMS - just switch to the Text-to-Speech tab and select "Amis".
Well, Sir, with Google having similar hyper-lingual project in the pipe, I wonder what I am doing fighting in the Open Source community for 10 years with such little budget.
Well, Sir, with Google having similar hyper-lingual project in the pipe, I wonder what I am doing fighting in the Open Source community for 10 years with such little budget.
I think you misunderstood me. I didn't mean to stop trying new ideas. I only meant to point out a learning resource. Start where they've left off and make it even better, or adapt it for a real use case. There's is just a proof of concept when it comes to Amis and now it's up to individual communities to actually put these new abilities to real use.
@jacob-8 , all good on my side, I'm happy to discover there are more powerful team than my Wikipedian team of 3h/week volunteers working on documenting minority languages. The Massively Multilingual Speech project you cite above is absolutely awesome. I simply think that if such more powerful teams lead such project in open source, then I can let Lingualibre go and move on to other self development or social issues.
Audio seems to have a poor quality, example.
1) What was the process to produce those audios ? Are there known limitations such as quality ?
2) Wikimedia provides an open recording tool to rapidly record 1000s audio files for dictionaries, https://lingualibre.org. Would it be interesting for you ?