Plot logits and ground truth like a spectrogram

anthonio9 commented 10 months ago

At the moment it seems that the plotting function is a bit broken. The number of xticks is as specified in the code, however the amount of total duration of the track differs very much from the length of the processed song.

Moreover, it seems that either inference or loss function are wrong, cause logits are very similar for every string. The current output of logits shows that all the strings output the polyphonic recognition instead of each string outputting its own monophonic pitch

anthonio9 commented 10 months ago

Alright, the x_labels are now fixed and show the right times.

anthonio9 commented 10 months ago

Fixed plot, with new xlabels and ylabels.

anthonio9 commented 10 months ago

Export of the figure is now much more suited for the polyphonic logits output. What's missing is the ground truth plot on top of the logits or at least next to them. Plots compatible with wandb are also missing.

anthonio9 commented 10 months ago

First the penn.evaluate code has to be adjusted to the new models, something isn't okay with that.

anthonio9 commented 10 months ago

Make sure that the logits are plotted nicely with the ground truth for the one string model and the original fcnf0++ trained with mtdb and ptdb. Then see what happens with the fcnf0++-gset-voiced configuration as well. Those should be quite okay for working with printing a all six strings on one set of logits only.

anthonio9 commented 7 months ago

This is now important again. Logit plots with ground truth on top do work well for the test set, however this is not currently available for audio files provided with labels, nor for monophonic audio files with polyphonic labels neither.

What we need is a function that can take: a. single track file with solos, poly label b. single track file with chords, poly label

The inside function has to take:

stft, poly logits, poly label
stft, mono logits, poly label

Tweak the existing penn.plot.logits to achieve the desired effect.

anthonio9 commented 6 months ago

Make a story with the plots. First show a plot with a raw pitch output, then output with pitch with periodicity values printed on top of it and finally different thresholds for the periodicity values filtering the pitch.

anthonio9 commented 6 months ago

Started the plots once again, this time with steps, first the STFT, then pred pitch, then ground truth, then finally pred pitch with thresholds. Below is the STFT with ground truth:

anthonio9 commented 5 months ago

Fixes and plots:

[x] use plot of the logits instead of the STFT
[x] fix the y axis of the multi-pitch plot
[x] try different kinds of logits - unnormalized vs after sigmoid

Description on how did the model get trained for the multipitch strings.

[x] how are the logits structured
[x] is the softmax applied and where
[x] especially is the softmax applied on per string basis or on all strings together
[x] how is the target constructed
[x] how the bins are structured, is it multi-hot, one-hot

Get a better understanding of the decoding with periodicity thresholding and if softmax is applied correctly

anthonio9 commented 5 months ago

Logits after sigmoid make so much more sense when visualized. Unnormalized logits are in a way spread and not that clear.

anthonio9 commented 5 months ago

[x] Reorder the strings, put the lowest at the bottom
[x] Set the color map range to be fixed between 0-1 for the normalized logits plots (fix it even for the unnormalized plots)
[x] Try B&W color scale and perhaps seaborn

anthonio9 commented 4 months ago

Example command for potting with the FCN model:

python -m penn.plot.to_latex --config config/polypennfcn-15ks-batch.py --checkpoint runs/polypennfcn-15ks-batch/00005000.pt --audio_file data/cache/gset/000121.wav -m -l --ground_truth_file data/cache/gset/000121-pitch.npy -m  -l

anthonio9 commented 4 months ago

[ ] add a comparison of softmax only on the plot vs softmax + log

anthonio9 / penn

Plot logits and ground truth like a spectrogram #5