anthonio9 / penn

Pitch Estimating Neural Networks (PENN)
MIT License
0 stars 0 forks source link

Plot logits and ground truth like a spectrogram #5

Open anthonio9 opened 6 months ago

anthonio9 commented 6 months ago

At the moment it seems that the plotting function is a bit broken. The number of xticks is as specified in the code, however the amount of total duration of the track differs very much from the length of the processed song.

Moreover, it seems that either inference or loss function are wrong, cause logits are very similar for every string. The current output of logits shows that all the strings output the polyphonic recognition instead of each string outputting its own monophonic pitch

Image

anthonio9 commented 6 months ago

Alright, the x_labels are now fixed and show the right times.

anthonio9 commented 6 months ago

Fixed plot, with new xlabels and ylabels.

Image

anthonio9 commented 6 months ago

Export of the figure is now much more suited for the polyphonic logits output. What's missing is the ground truth plot on top of the logits or at least next to them. Plots compatible with wandb are also missing.

anthonio9 commented 6 months ago

First the penn.evaluate code has to be adjusted to the new models, something isn't okay with that.

anthonio9 commented 6 months ago

Make sure that the logits are plotted nicely with the ground truth for the one string model and the original fcnf0++ trained with mtdb and ptdb. Then see what happens with the fcnf0++-gset-voiced configuration as well. Those should be quite okay for working with printing a all six strings on one set of logits only.

anthonio9 commented 3 months ago

This is now important again. Logit plots with ground truth on top do work well for the test set, however this is not currently available for audio files provided with labels, nor for monophonic audio files with polyphonic labels neither.

What we need is a function that can take: a. single track file with solos, poly label b. single track file with chords, poly label

The inside function has to take:

Tweak the existing penn.plot.logits to achieve the desired effect.

anthonio9 commented 2 months ago

Make a story with the plots. First show a plot with a raw pitch output, then output with pitch with periodicity values printed on top of it and finally different thresholds for the periodicity values filtering the pitch.

anthonio9 commented 2 months ago

Started the plots once again, this time with steps, first the STFT, then pred pitch, then ground truth, then finally pred pitch with thresholds. Below is the STFT with ground truth:

Image

anthonio9 commented 1 month ago

Fixes and plots:

Description on how did the model get trained for the multipitch strings.

Get a better understanding of the decoding with periodicity thresholding and if softmax is applied correctly

anthonio9 commented 1 month ago

Logits after sigmoid make so much more sense when visualized. Unnormalized logits are in a way spread and not that clear.

anthonio9 commented 1 month ago
anthonio9 commented 3 weeks ago

Example command for potting with the FCN model:

python -m penn.plot.to_latex --config config/polypennfcn-15ks-batch.py --checkpoint runs/polypennfcn-15ks-batch/00005000.pt --audio_file data/cache/gset/000121.wav -m -l --ground_truth_file data/cache/gset/000121-pitch.npy -m  -l
anthonio9 commented 3 weeks ago