Differences in electronic speech

macramole commented 4 years ago

Hi, I was using an old model in commit 530a80f14 and I've noticed that the new model detects electronic speech much better.

I was wondering if that difference is related to my data only or if the new model actually detect electronic speech better (ie it was trained with electronic speech and the old model was not).

Also, I'd like to know if there are plans to include an "electronic speech" label which come in handy for our research.

thanks in advance.

MarvinLvn commented 4 years ago

Hi !

The "new" model was significantly better than the "old" model on every classes. This old version is just something we obtained while developing our model until converging to the published results (that have been obtained with the "new" model).
But none of these models have been trained on electronical speech as we don't have this label in our training set. We just have the ELE class on our held-out set, meaning that we can just have a look at how ELE frames are classified by the model (into SPEECH, or SIL for instance).

If we compare model 1 and model 2 performances on the test set (containing KCHI, CHI, MAL, FEM, SPEECH labels) and if we assume that model 1 is consistently better than model 2, it is very likely than model 1 will be better at identifying ELE frames as being SIL (even though it hasn't been trained to classify this specific class).

Also, I'd like to know if there are plans to include an "electronic speech" label which come in handy for our research.

Not in the near future unfortunately ! But we never know :)

Closing this issue. Feel free to reopen if you have any other related questions !

macramole commented 1 year ago

Hi Marvin !

I see I've never replied, but I really appreciated your thoughtful reply back then, so sorry about that.

We are manually segmenting and labeling audio files with the ELE class. We are planning to fine-tune your model and add class ELE. Since you posed the issue as a multiclass problem I guess this shouldn't bring so much trouble.

I'd like to know if you have any insight on this issue and if you'd like to share it. If you do, I can explain in more detail what are our thoughts on this. We will of course share the model publicly.

Best regards L.

MarvinLvn commented 1 year ago

Hi there!

Instructions to train the voice type classifier from scratch are available here and general instructions to fine-tune a model are available there.

I agree with you, fine-tuning the model should not bring so much trouble. The task specification needs to be changed a tiny bit:

task:
   name: MultilabelDetection
   params:
      duration: 2.0
      batch_size: 64
      per_epoch: 1
      labels_spec:
        regular: ['KCHI', 'CHI', 'MAL', 'FEM', 'ELE'] # new label to predict here
        union:
          SPEECH: ['KCHI', 'CHI', 'FEM', 'MAL', 'UNK']

Assuming your rttm gold files contain ELE utterances annotated for electronic speech. This was for the technical details.

On a more conceptual level, I can't ensure you that the accuracy you'll obtain on the new ELE class will be any good as the voice type classifier has been trained to map electronic speech to silence specifically. But I do think it is definitely worth a shot! Your data will be annotated for KCHI/CHI/MAL/FEM too, right?

Can't wait to see a new open-source model tackling the same task :)

macramole commented 1 year ago

Hi @MarvinLvn , thanks for the links and all the data, is super helpful to me.

Yes, the data will be annotated that way KCHI/CHI/MAL/FEM + ELE.

It is good to know about ELE being map to silence, so thanks for the detail. Hopefully the system will forget about that and relearn the new task.

Right now I'm still waiting for the annotated data but it shouldn't take much longer.

I'll keep you up to date when we have some results .

Thanks again !

MarvinLvn / voice-type-classifier

Differences in electronic speech #13