Closed mr-martian closed 5 years ago
Originally posted by @jonorthwash at 2019-05-30T18:18:11Z
Might be an issue with the TextGrids, perhaps related to #48.
Originally posted by @BartholomaeusAstrum at 2019-06-05T18:45:07Z
Here's what it looks like on slide 040 from P03. I can't tell if it's off, or if they're just being rendered differently...
Originally posted by @jonorthwash at 2019-06-05T20:50:42Z
They're definitely rendered differently. If you set window length to 0.0025s and dynamic range to 50dB in Praat, you get the following. Compare to your screenshot from UltraTrace and you'll see that it seems to be off a bit, but maybe in different directions at the two ends? (??) Specifically, UltraTrace seems to start later but end earlier.
Originally posted by @BartholomaeusAstrum at 2019-06-06T21:26:49Z
I think I've fixed it, by figuring out the right amount of time to add to the beginning and end of the audio snippet used to create the spectrogram. I have all my reasoning/algebra on the board in the PhonLab, though there's still a small aspect where I'm confused why it works...
Originally posted by @jonorthwash at 2019-06-07T03:42:10Z
That certainly looks much better. One test to make sure it's right might be to see if it moves around at all.
Also, it would be good to document your reasoning elsewhere too.
Originally posted by @BartholomaeusAstrum at 2019-06-21T16:14:54Z
I extracted 10ms, 11ms, and 12ms audio clips, and drew spectrograms for each (from top to bottom) with a window size of 5ms and time step of 2ms. The first two spectrograms (10&11ms) are 2ms long, and the last (12ms) is 4ms long.
Given the window size and time step, we expect that Praat captured the following spectrogram frames for each audio clip:
10ms: 0-5ms, 2-7ms, 4-9ms, 6-10ms(?) 11ms: 0-5ms, 2-7ms, 4-9ms, 6-11ms 12ms: 0-5ms, 2-7ms, 4-9ms, 6-11ms, 8-12ms(?)
Because the 10ms and 11ms audio clips have the same length of spectrogram, we believe that Praat is indeed capturing partial frames. It it still unclear, however, why the first two audio clips display only one frame's worth of spectrogram, and the last displays two. We also notice that for the first two audio clips, there is only one set of frames that do not overlap in time (the first and last), while in the last audio clip, there are two such pairs (the first with the second to last, and the second with the last.)
Originally posted by @jonorthwash at 2019-06-21T18:05:09Z
@mr-martian, would you mind taking this issue from here? The goal is to figure out how much padding is needed given a segment of audio of known length in order for a spectrogram of exactly that length to be generated.
Originally posted by @jonorthwash at 2019-06-21T18:57:51Z
Also note that the first two images appear to show only one step's worth of results of FFT, whereas the last appears to show two step's worth of results of FFT.
Originally posted by @mr-martian at 2019-06-21T19:52:20Z
As far as I can tell, what's happening is that it's removing 1 window size worth of time steps from either end, and then taking off a bit more if the result isn't an integer multiple of the time step.
So, with window length 5ms and time step 2ms, it will always take 4ms off each end.
10ms -> 2ms OK
11ms -> 3ms round-> 2ms
12ms -> 4ms OK
13ms -> 5ms round -> 4ms
14ms -> 6ms OK
With window length 5ms and time step 3ms, it takes 6 off each end, and 13ms goes to 7ms, which is rounded to 6ms.
In the code, the time step is duration / 10000, so we don't have to worry about rounding and can just add time step * floor( window length / time step)
to each end of the sound clip.
Originally posted by @jonorthwash at 2019-06-21T20:04:53Z
In the code ... we ... can just add
time step * floor( window length / time step)
to each end of the sound clip.
...before getting the spectrogram, but also not send this for playback.
And also we should pad the whole file's spectrogram with that much I guess too?
Originally posted by @mr-martian at 2019-06-21T20:07:42Z
I had it pad by that much unless doing so would take it out of bounds. The padding applies only to the call to parselmouth.Sound.extract_part(), the result of which is only used in spectrogram construction.
Originally posted by @jonorthwash at 2019-06-21T23:14:09Z
That sounds like a reasonable approach.
If we want to be really accurate, we could put graphical padding on the side of spectrograms where it would be out of bounds. But assuming most audio files are tens of seconds long, then we're probably close enough for all situations.
I'll test out the behaviour soon and follow up here.
Originally posted by @BartholomaeusAstrum at 2019-06-24T17:54:22Z
@mr-martian, I added the extra amount you recommended to both the beginning and end of the audio clip. Is that what you meant? When I added it just to the end, it was clearly wrong, but this way I'm pretty sure is right.
Originally posted by @mr-martian at 2019-06-24T17:59:01Z
Didn't my commit already do that? Why are you adding extra/2
rather than extra
at the end?
Originally posted by @BartholomaeusAstrum at 2019-06-24T17:59:51Z
Hmmmm I see that somehow when I pulled I didn't see the changes you made -- I'll go revert to yours :P
Originally posted by @BartholomaeusAstrum at 2019-06-24T18:02:43Z
@mr-martian fixed now! Thank you
Originally posted by @jonorthwash at 2019-06-24T18:06:45Z
I can confirm that @mr-martian's code appears to behave exactly as expected.
Originally posted by @jonorthwash at 2019-05-30T18:17:06Z (original issue)
The spectrogram currently appears to line up with TextGrid differently than in Praat. See for example the beginning of the vowel in the attached
screenshots.