alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.09k stars 1.11k forks source link

Differences in results between Java and Python #431

Open peterkronenberg opened 3 years ago

peterkronenberg commented 3 years ago

Would you expect to get slightly different results of the speaker signatures in Java and Python? The underlying code code is obviously the same, so I"m not sure where any difference would be introduced, either due to rounding or other reason. Both are running on the same machine (Windows 10). Java 8 and Python 3.9.1

The first 1 or 2 significant digits are usually the same. But then it veers off.

Running on the same exact input file, here is the speaker signature for Java (for the first phrase)

one zero zero zero one
[-1.108688, 0.130483, 1.362421, 0.658426, -0.227594, -0.331269, 0.682551, -0.361182, 0.422, 0.023953, 0.684812, 0.642821, -0.723576, 0.194851, 0.217675, -0.281156, 0.040565, -0.865059, -1.664098, -0.524114, 0.781138, -0.046018, 0.826312, -0.192645, -0.655568, -0.566615, -0.695058, -0.899362, 0.702597, 0.955827, -0.515553, -0.697694, -0.343437, 0.010842, 0.349472, 1.057606, -0.648826, 0.383428, 1.076154, 0.627731, -0.792871, 0.353089, 0.416272, -1.644061, -1.042834, 0.324374, -1.415677, 0.378488, -0.44945, 0.26568, -0.657256, 0.340553, -0.637015, 0.401298, -0.846093, -0.367624, 0.964261, -1.667628, -0.803072, -0.477784, -0.117188, -0.466145, 0.796645, -1.214523, 0.900356, -1.056739, -0.409409, 1.931106, -1.364707, -0.289888, 1.58544, 0.761426, -0.57971, 1.280144, 0.845493, 1.365517, -1.055958, -1.062707, -0.978577, -0.651955, -1.369447, -1.090644, 2.623788, -2.078337, -0.10537, 0.184645, -1.203655, -1.399164, -0.468887, 0.403174, 1.328133, -1.889551, 1.393952, 0.687955, 0.187534, -0.22866, 1.499812, -2.254326, -1.842097, -0.673903, 0.79769, -1.625062, 0.166912, -0.991722, 3.239406, 0.409751, -0.150836, 1.506196, 1.052316, -0.139431, 0.382878, -0.948555, 0.746153, 0.302209, -1.711524, 1.273156, -0.228381, 0.330816, -0.279153, 0.99944, -0.413017, -1.374116, -1.909007, -0.639204, -1.820923, 0.484024, 1.002588, -0.577619]

And here it is for Python

Text: one zero zero zero one
[-1.101735, 0.123902, 1.384669, 0.660679, -0.226407, -0.314176, 0.683908, -0.37491, 0.432867, 0.021404, 0.701606, 0.644564, -0.718554, 0.201241, 0.218858, -0.265522, 0.041433, -0.837582, -1.614151, -0.539826, 0.805124, -0.037315, 0.852288, -0.18448, -0.68562, -0.569364, -0.696874, -0.905103, 0.722718, 0.98236, -0.511375, -0.720799, -0.330885, -0.002339, 0.369063, 1.045881, -0.661028, 0.376405, 1.078469, 0.645985, -0.789146, 0.346652, 0.40883, -1.652467, -1.069421, 0.317625, -1.415762, 0.353761, -0.451041, 0.267733, -0.663321, 0.358913, -0.609146, 0.410242, -0.85994, -0.395965, 0.968916, -1.633204, -0.813908, -0.442472, -0.103385, -0.461118, 0.799929, -1.250627, 0.875033, -1.036958, -0.40653, 1.953683, -1.343173, -0.274624, 1.587584, 0.750961, -0.555801, 1.262252, 0.839781, 1.362901, -1.023427, -1.066903, -0.958255, -0.636679, -1.361478, -1.09667, 2.638883, -2.097605, -0.106955, 0.174486, -1.223806, -1.401738, -0.44114, 0.430349, 1.34652, -1.905731, 1.386036, 0.689997, 0.209174, -0.236016, 1.495777, -2.241295, -1.800741, -0.665005, 0.822632, -1.610678, 0.154891, -1.045229, 3.249851, 0.424704, -0.153623, 1.493881, 1.062201, -0.153397, 0.328413, -0.986797, 0.750299, 0.279735, -1.727737, 1.281444, -0.188606, 0.356543, -0.279407, 0.984153, -0.387646, -1.385723, -1.898263, -0.623229, -1.800873, 0.480625, 0.96441, -0.557397]
Jochen-sys commented 3 years ago

The words are the same and maybe when you try it again you are getting different floats.

peterkronenberg commented 3 years ago

The results are consistent are each system. They are just different from each other.

Jochen-sys commented 3 years ago

I think you can take that. So far as I know the floats say how much the system thinks that it is one word, but this could also be wrong, so I'm not sure. Or from where did you take the floats?

peterkronenberg commented 3 years ago

What do you mean 'where did you take the floats'? These are just the results from Vosk. They seem like more than rounding errors. Why would the 2 languages give such different results?

Jochen-sys commented 3 years ago

Ok I understand. But are the floats exactly the same when you run it agian for example in Python? Sorry I'm not a Collaborater so I don't know exactly.

peterkronenberg commented 3 years ago

Yes, the results on each system are consistent and always exactly the same

Jochen-sys commented 3 years ago

Ok than I'm wondering a little bit. Then I don't think that these floats are probabilities. Maybe the difference is because this are two different languages, so different set-ups.