YuanGongND / gopt

Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".
BSD 3-Clause "New" or "Revised" License
153 stars 28 forks source link

Is it possible to apply GOPT in real life application? #34

Open vinh22032000 opened 10 months ago

vinh22032000 commented 10 months ago

Hello,

Does GOPT solution has potential to go to real life production apps if it is fine tuned? I am curios about that, thanks.

anelibon commented 10 months ago

Dont know but I have the same question as you. I am currently working on my thesis, whose objective is to automatically evaluate the orality of English as a second language, but struggling with a lot of things. I'll leave my email in case you want to talk about it: anelibon7@gmail.com

YuanGongND commented 10 months ago

@anelibon I cannot answer the question about real life application. This is very complex for many things, especially about data.

But I believe this code is good enough for a thesis - check our colab training script: https://colab.research.google.com/github/YuanGongND/gopt/blob/master/colab/GOPT_GPU.ipynb

anelibon commented 10 months ago

@YuanGongND, thanks for your response.

Yes, I had already read your paper and reviewed the training script in Colab; it's an amazing project, and thanks a lot for making the code available. It's crucial for people like me who are seeking to enter this field and face a lot of challenges.

However, I need to make inference with my own data, and from what I read in the Issues section, the bug isn't fixed, right?Additionally, I read a comment from you mentioning that even if the bug is fixed, it may still not work well with other data due to the impact of the phn input, did I understand it correctly?

Thanks in advance.

YuanGongND commented 10 months ago

hi there,

However, I need to make inference with my own data, and from what I read in the Issues section, the bug isn't fixed, right?

We didn't officially has a wav based inference code. The one you mentioned is from a third-party. The reason we did not provide one is simply because we didn't have one. In the paper, we only evaluated on the SO762 dataset.

We do provide a very detailed guidence on how to extract GOP feature for SO762 dataset using Kaldi. User who are familiar with Kaldi should be able to adapt this to extract GOPT feature for their own datasets. (Note: Kaldi has a sharp learning curve for non-ASR professional).

All code that is from us is supposed to be bug-free and can fully reproduce the results in the paper.

Additionally, I read a comment from you mentioning that even if the bug is fixed, it may still not work well with other data due to the impact of the phn input, did I understand it correctly?

For almost all DNN based models, inference on a dataset that is different from the training set would have a performance drop, so does our GOPT model. In the context of this task, the factor includes: the spoken text, the native language, the speaker demographics.

-Yuan

gsabarinath02 commented 8 months ago

Dont know but I have the same question as you. I am currently working on my thesis, whose objective is to automatically evaluate the orality of English as a second language, but struggling with a lot of things. I'll leave my email in case you want to talk about it: anelibon7@gmail.com

Hey,

Are you able to achieve this? I'm also working on a similar project to evaluate voice recordings in real-time. Could you tell me how you achieved this? I've encountered some errors, like "utt2spk" and "spk2utt" not matching. If you're willing, we can collaborate and develop a robust system where people can assist each other in recording their voices in real-time and evaluating them.

If anyone is interested in joining this effort, we can work together and accomplish it as soon as possible. I don't need any credits since I'm pursuing this for study purposes and out of passion. Anyone can take credit if desired.

Let me know if you're interested in collaborating!