anthonio9 commented 8 months ago

Thesis story wrap-up:

[ ] should contain a comparison to PENN when with polyphonic pitch audio to show that it does not work at all
[ ] show that PENN with split hexaphonic input works quite well
[x] make a common metric for FretNet and PPN and hope for the best
[ ] make a strong conclusion about the string confusion problem with PPN, compare the standard metrics with the Multi-Pitch specific string agnostic metrics (close the generalization gap). The monophonic audio example is great for showing the confusion.
[ ] rethink the visualizations
- consider making the lines NOT dotted, just make one of the lines thicker, green and yellow color scheme for the logits isn't the best, maybe try a gray scale.

anthonio9 commented 8 months ago

FretNet has multiple evaluator classes like PitchListEvaluator or MultipitchEvaluator. Each class has an unpack() method and evaluate method. The idea now is to derive from PitchListEvaluator class, use the same unpack() function, but a completely different evaluate().

anthonio9 commented 7 months ago

Why does the reference differ so much from estimated in time stamps, but only in the pitch_list?

anthonio9 commented 7 months ago

As for the previous question, it seems that only every 4th timestamp is present in the predicted set. This means that FretNet is only made to handle larger buffers and latency is larger then what was designed in PPN.

anthonio9 commented 7 months ago

Thesis story wrap-up:

* [ ]  should contain a comparison to PENN when with polyphonic pitch audio to show that it does not work at all

How should this be presented?

Plot of an example track with ground truth and predicted pitch, both over a spectrogram - This should be pretty good. In addition to that a metric for FRMSE and FRPA would be a great addition.

anthonio9 commented 7 months ago

Thesis Layout

Thesis should be 30-40-50 pages.

including pictures

Introduction:

General motivation
Something about the results
Introduction usually does not get into model details
Section 2:
go over theory: CNN - methods section
usually this is split into the background section and proposed method section
the background section is more detailed then in papers
background: explain the model of the original paper: PENN and FretNet
simply discover the problem with the GuitarSet dataset

Section 3: Proposed method

How is the model adapted to the polyphonic tracking
Explain the RMSE / RPA - accuracy metrics, the string agnostic metrics

Section 4: experiments

dataset explanation / description
tell more about the problem with the GuitarSet datset

Results 5: results and analysis

Conclusion
Fully convolutional models or transformers
New dataset with better ground truth is needed! [https://arxiv.org/abs/2309.09085](new dataset)

anthonio9 commented 6 months ago

For the next meeting: the table of contents + anything extra is nice.

anthonio9 commented 6 months ago

Main results table: String Agnostic RMSE, String Agnostic RPA Non-String Agnosic RMSE, Non-String Agnostic RPA,

if possible, copy String-Agnostic Note from FretNet

anthonio9 commented 5 months ago

[ ] methods and background section is START IT!

anthonio9 commented 4 months ago

[ ] explain the math behind everything that you use: Feed Forward NNs, CNNs, CQT. You don't have to talk about anything more, like transformers etc.
[ ] background: Explain the architecture of MLP, CNNs
[ ] Background: Explain how the network and classification of NNs, what is softmax, relu, binary cross entropy (piano roll model), categorical cross entropy (one-hot model)
[ ] Periodicity and entropy
[ ] Spend some time on the understanding, calculation and description of the receptive field

anthonio9 / penn

Thesis Story #10

Thesis Layout

Thesis should be 30-40-50 pages.

Introduction:

Section 2:

Section 3: Proposed method

Section 4: experiments

Results 5: results and analysis