anthonio9 / penn

Pitch Estimating Neural Networks (PENN)
MIT License
0 stars 0 forks source link

New network design: 60 midi notes + offsets #9

Open anthonio9 opened 5 months ago

anthonio9 commented 5 months ago

This issue is about the introduction of a new network model, where the logits layer instead of 61440 pitch bins has 660*2 midi bins. Each of the 60 values in the first column corresponds to a separate midi note. The second column tells how big the pitch deviation is from that midi note, 0 being -50, 59 being +50 cents.

Conversion to this new format is done with penn.data.preprocess.core.note_dict_to_pitch_dict60() where 60 obviously stands for the number of midi notes.

anthonio9 commented 5 months ago

Well, the new config is almost ready for training, there are some problems left for tomorrow, like there's still no metrics or the offset is calculated on a cpu instead of the gpu. Fix it!

anthonio9 commented 5 months ago

So far been trying to teach the network descrete midi notes, it seems to work rather well, so far the results are like below:

image

Seems like lower number of logits is not necessarily the same as better accuracy.

anthonio9 commented 5 months ago

Take a look here to resolve the problem with images not being logged with wandb https://github.com/wandb/wandb/issues/1252