Closed dreamk73 closed 3 years ago
Hi, @dreamk73 thank you again for your feedback.
Regarding your pitch halving problem, could please send a sample (or some samples) of problematic speech signals? There are a couple of things that I want to test.
It could be just a matter of adjusting the parameter PITCH_HALF_SENS from the PitchObj. But since this parameter acts together with mean pitch and pitch standard deviation values, eventually it could not work well with your short speech samples.
It is also possible that your problem is caused by this part of the code. I had marked it to removal, but at the time I could not figure out why the original authors excluded it. Therefore, since it was not affecting me, it is still there.
I could be also a corner case that was not foreseen by the original authors, like the issue 10 that we fixed recently.
Finally, it could be also a limitation from the algorithm itself. The original paper was published in 2008 and have not received any major update since then. Anyway, I still have to do a more intense research in order to verify if anyone had come up with a substantial improvement during this period.
So, I want to check all possibilities before giving you a feedback.
Regarding the speed problem, it was a bit inevitable, since YAAPT combine a set of computationally expensive tests.
Last week I published the AMFM_decompy 1.0.10 release and added some notes about this matter to the documentation. As I mention there, the original authors have added a feature at the Matlab code that results in a speed improvement at the cost of pitch detection accuracy. Personally I do not consider this trade off very worthy, so I prefer to focus on adding support for numba and CUDA in a future release.
I am already working on some improvements, but unfortunately this numba release will not come soon, since that I need to code some preparatory steps before it. Anyway, most of AMFM_decompy upgrades in the last years were basically just bug-fixing releases, so I guess that it is about time to add some major improvements.
Thanks. I looked at the PITCH_HALF_SENS parameter in pYAAPT.py but I found out that in this example it never got to that line because self.PITCH_HALF was 0. However, when I commented out the first two conditions in the part of the code that you mentioned, the problem went away at least in this particular example. I will run it on a few more to make sure.
Ok, nice to know that it helped. I will be waiting then for your next results, and in case this modification really solve your issues, I will add it to the code.
The code lines that were causing the bug were removed in the 1.0.11 release. Since no other problems were reported, I will consider this issue solved.
Whenever I run pYAAPT on a short sentence (one-word), I get a few pitch halving errors in the resulting pitch. The problem is worse when there are long silences before and after the word. These are high-quality studio recordings of a professional voice talent. What could I try to get rid of these errors? In general, I feel that pYAAPT performs better than other methods I am comparing it to (WORLD, swipe, reaper), but this is a bit of a deal breaker. It is also one of the slowest methods.