Closed stet-stet closed 1 year ago
Thanks for the questions.
Excited to hear you're iterating on this research direction! Happy to provide additional info to aid our investigation.
Re: more data vs. new formulation, why can't you just train your model from scratch on Fraxtil / ITG and compare performance w/ your fine tuned model? Then, you would compare: (1) my model trained from scratch, (2) your proposed model trained from scratch on the same data, and (3) your proposed model pre-trained and then fine tuned on the same data. Shouldn't this properly ablate the effects of pre-training with your new formulation?
Unfortunately, I never bothered porting the training code to PyTorch. The TF 1.0 codebase is still the right reference for that. At this point it probably would only work in Docker.
Thank you for the confirmations! I understand how a low-overhead model may be desired for a real-life service such as Beat Sage, and how a <0.01 point differences in F1 may not be so meaningful.
About Re:, Thank you for the suggestion! I see that comparing (2) and (3) would help illustrate the effects of having more data under our alternate formulation. However, my problem was that the metrics for (2) came out to be quite low compared to (1) or (3). I was afraid that our proposed method may be seen to be of lesser worth than what it really is, since I did find our model to perform much, much better on a larger dataset.
So what I actually want to demonstrate is "given access to enough data, our alternate approach/formulation helps." Again, although comparing (2) and (3) may certainly support this claim, I thought this may not be enough since I'd also expect the baselines to also do better with more data. So I wrote my own training code (in Pytorch) to compare the models on a larger dataset. The test is running right now.
Again, thank you very much for the information, and thank you for this awesome repo!
Gotcha. This makes sense! Yes, I suspect that the tiny CNN we used for onset placement may not benefit as much from additional training data compared to a more modern approach. Looking forward to reading about your findings!
Hello! If you're still interested, here is my demo page, which includes the link to the extended abstract & code used.
Hello! Thank you for posting the model and the code.
**I write to inquire the following.
Here is a context on why I ask this, just for your information.
I have been studying a way to improve upon DDC, using an alternate formulation to eliminate the binary class imbalance discussed in your paper. I am almost done with a viable demo & evaluations, so I am looking to submit my work to ISMIR LBD if possible. I have gathered another dataset to train and eval on - this set is the only thing that enables my model to actually converge. Up till now, I have finetuned this pretrained model on Fraxtil/ITG, and then compared the F1 metrics with the numbers reported on your paper. While this does give superior metrics, I do still wish to ascertain if this is entirely because of me using more data in pretraining, or if my alternate problem formulation actually did help a bit.
Now, until yesterday I had no way to investigate this, since your past code shows CUDA or CUBLAS-related errors in every piece of hardware I could lay my hands on, and with how I failed to troubleshoot this after a few days of attempt. I was daunted by the prospect of having to re-implement and validate the whole model and the eval pipeline in TensorFlow 0 - a framework whose documentation has seemingly has been scrubbed off the face of Earth by its makers.
However, this repo opens up quite a range of possibilities for me. Now I can actually train on a piece of code you, the author, wrote and possibly validated, which also apparently does the same thing as the code published back in 2017. This is why I am led to ask the questions above.
I would be deligted to hear back from you. Thank you!