Open limazix opened 4 years ago
@limazix I completely agree with you and we should focus on the problems that you listed to start this implementation. Can you help me checking if there're existing issues covering them and open ones if it's not?
I just opened one for myself to document the existing translation process.
@filipecorrea, is there any translation sample data? I'm finally able to finish the POC, but I need a short-to-medium data sample to train the model.
@limazix, there's a short-to-medium data sample in LIBRAS / PT-BR in https://github.com/IBM/libras/tree/hkbase/data.
Do you need ASL / EN-US? How many sentences? I can ask our collaborators to create that.
@filipecorrea I believe that this dataset will be enough for testing, but which file should I use? How is it organized?
I'll send you a data sample in IBM's Slack.
Is your feature request related to a problem? Please describe. It is not clear if the current method used for language translation is the best approach.
Obs.: Is there have any document or explanation of how the translation has been handle?
Describe the solution you'd like Lately, deep learning techniques are giving excellent results for translation. The most notorious implementation is the Seq2Seq, where it is trained by receiving pairs of sentences from both languages. With the model trained, it will be capable of transforming one sentence from one language to the other one.
Describe alternatives you've considered There are multiple alternatives to implement Seq2Seq:
The two problems, how to make it available and enough data to have high accuracy.
About models, it's crucial to properly manage it by scheduled training, constant evaluation, and improvement. I do believe that we can use Watson Data Studio to build and deploy, and Watson Machine Learning to expose.
Regarding data, I suggest to open another issue and define strategies to collect it.
Additional context
Infrastructure Overview