Open anthonio9 opened 10 months ago
Alright, the x_labels
are now fixed and show the right times.
Fixed plot, with new xlabels
and ylabels
.
Export of the figure is now much more suited for the polyphonic logits output. What's missing is the ground truth plot on top of the logits or at least next to them. Plots compatible with wandb
are also missing.
First the penn.evaluate
code has to be adjusted to the new models, something isn't okay with that.
Make sure that the logits are plotted nicely with the ground truth for the one string model and the original fcnf0++
trained with mtdb and ptdb. Then see what happens with the fcnf0++-gset-voiced
configuration as well. Those should be quite okay for working with printing a all six strings on one set of logits only.
This is now important again. Logit plots with ground truth on top do work well for the test set, however this is not currently available for audio files provided with labels, nor for monophonic audio files with polyphonic labels neither.
What we need is a function that can take: a. single track file with solos, poly label b. single track file with chords, poly label
The inside function has to take:
Tweak the existing penn.plot.logits
to achieve the desired effect.
Make a story with the plots. First show a plot with a raw pitch output, then output with pitch with periodicity values printed on top of it and finally different thresholds for the periodicity values filtering the pitch.
Started the plots once again, this time with steps, first the STFT, then pred pitch, then ground truth, then finally pred pitch with thresholds. Below is the STFT with ground truth:
Fixes and plots:
Description on how did the model get trained for the multipitch strings.
Get a better understanding of the decoding with periodicity thresholding and if softmax is applied correctly
Logits after sigmoid make so much more sense when visualized. Unnormalized logits are in a way spread and not that clear.
Example command for potting with the FCN model:
python -m penn.plot.to_latex --config config/polypennfcn-15ks-batch.py --checkpoint runs/polypennfcn-15ks-batch/00005000.pt --audio_file data/cache/gset/000121.wav -m -l --ground_truth_file data/cache/gset/000121-pitch.npy -m -l
At the moment it seems that the plotting function is a bit broken. The number of xticks is as specified in the code, however the amount of total duration of the track differs very much from the length of the processed song.
Moreover, it seems that either inference or loss function are wrong, cause logits are very similar for every string. The current output of logits shows that all the strings output the polyphonic recognition instead of each string outputting its own monophonic pitch