Issue with audio quality ?

hugolpz commented 1 year ago

Audio seems to have a poor quality, example.

1) What was the process to produce those audios ? Are there known limitations such as quality ?

2) Wikimedia provides an open recording tool to rapidly record 1000s audio files for dictionaries, https://lingualibre.org. Would it be interesting for you ?

sih4sing5hong5 commented 1 year ago

The audio were synthesized by HTS 2.3 text-to-speech engine and the training corpus was from ilrdf.

The text-to-speech engine can be updated to WaveRNN, which needs 3~4 hours audio.

hugolpz commented 1 year ago

@sih4sing5hong5 hello,

T2S input creation via lingualibre

A 8000 sentences written corpus can be produce from Amis wikipedia.
This written corpus can be imported on Lingua libre, which whole purpose is to allow rapid recording and creation of audio corpuses. Speed: 400 sentences per hours.
Find a native speaker commited to that project,
Reading and recording of the 8000 sentences audio corpus produces a downloadable, open licence written-audio Amis corpus ideal for text2speech training by Tacotron-like systems.

Words recording via Lingualibre

If you have less that 10,000 Amis words, a faster but less scalable way would be to just record the list of Amis headwords existing in your dictionary. Wikimedia's Lingualibre allows to record 800 audio words/hour.

[EDIT]: it appears there are usable resources, easy to process with regex, to get list of headwords in most (all?) Taiwanese minority languages. See https://e-dictionary.ilrdf.org.tw . Fair use / academic purpose reuse can be claimed.

Overall situation

What most interest me there, is

your team seems to have accessible Machine learning T2S know how.
whereas modest data mining + Wikimedia Lingualibre.org + a native speaker 10 hour work can generate solid input data for such ML.

Note

This discussion is just exploratory to map available resources and assess feasibility.

sih4sing5hong5 commented 1 year ago

Thanks for recommendation :+1:

Using written corpus to record from Amis wikipedia is a new idea for me. It maybe work :)

I don't what is the most important for the users. If the audio for words is more important than the audio for sentences, just record the list. Or the audio for sentences is more important than word's, It maybe need text-to-speech within limited resources.

@miaoski and @wildjcrt , do you have feedback from Facebook or GA?

hugolpz commented 1 year ago

( @sih4sing5hong5 see edit above. )

jacob-8 commented 1 year ago

Do note that Amis has 30+ hours of recorded audio that Facebook put to use in their Massively Multilingual Speech project. You can try out the Amis text to speech at https://huggingface.co/spaces/mms-meta/MMS - just switch to the Text-to-Speech tab and select "Amis".

hugolpz commented 1 year ago

Well, Sir, with Google having similar hyper-lingual project in the pipe, I wonder what I am doing fighting in the Open Source community for 10 years with such little budget.

jacob-8 commented 1 year ago

Well, Sir, with Google having similar hyper-lingual project in the pipe, I wonder what I am doing fighting in the Open Source community for 10 years with such little budget.

I think you misunderstood me. I didn't mean to stop trying new ideas. I only meant to point out a learning resource. Start where they've left off and make it even better, or adapt it for a real use case. There's is just a proof of concept when it comes to Amis and now it's up to individual communities to actually put these new abilities to real use.

hugolpz commented 10 months ago

@jacob-8 , all good on my side, I'm happy to discover there are more powerful team than my Wikipedian team of 3h/week volunteers working on documenting minority languages. The Massively Multilingual Speech project you cite above is absolutely awesome. I simply think that if such more powerful teams lead such project in open source, then I can let Lingualibre go and move on to other self development or social issues.

g0v / amis-moedict