NetEase / Polyphonic-TrOMR

TrOMR:Transformer-based Polyphonic Optical Music Recognition
Apache License 2.0
45 stars 12 forks source link

Dataset download #4

Open Yuan-ManX opened 1 year ago

Yuan-ManX commented 1 year ago

Is the dataset open source? How to download?

liebharc commented 11 months ago

I would also be happy to see the data set. In addition, it would be nice to have the training code available. In my test runs the results seem to be sensitive depending on how the staffs are cropped from a larger image. I would think that this could be improved by adding more distortions to the training set.

Daniel63656 commented 10 months ago

will the dataset be made public?

liebharc commented 7 months ago

From what I can see, the data set was never published. While I still hope that this might change in the future, I started an attempt to train this model on a mix of the PrIMuS data set and the Grandstaff data set. The results aren't as robust yet as what I get with the weights provided in this repo, but in some cases it works well. I put the training code so far on my fork of this repo: https://github.com/liebharc/Polyphonic-TrOMR

noobpeng99 commented 7 months ago

From what I can see, the data set was never published. While I still hope that this might change in the future, I started an attempt to train this model on a mix of the PrIMuS data set and the Grandstaff data set. The results aren't as robust yet as what I get with the weights provided in this repo, but in some cases it works well. I put the training code so far on my fork of this repo: https://github.com/liebharc/Polyphonic-TrOMR

You are right,I have also attempted to train TrOMR on the PRIMuS dataset, simply by scaling the images to a fixed size. My results show that TrOMR's performance does not exhibit a significant advantage, with a symbol error rate exceeding 3% on the CameraPrIMuS dataset. Can you share your test results?

liebharc commented 7 months ago

You are right,I have also attempted to train TrOMR on the PRIMuS dataset, simply by scaling the images to a fixed size. My results show that TrOMR's performance does not exhibit a significant advantage, with a symbol error rate exceeding 3% on the CameraPrIMuS dataset. Can you share your test results?

I haven't calculated a symbol error rate yet. Right now, I run the inference on a small set of example images, such as https://github.com/BreezeWhite/oemer/blob/main/figures/tabi.jpg (after splitting it into single staff images) to get a feeling on how well it performs.

Is the code you are using to calculate the SER available somewhere? To get meaningful results, I'd also need another data set to calculate the SER on. Since PrIMuS is used for the training, I can't of course also use it to rate the performance of the results. At least for monophonic examples, it shouldn't be too hard for me to find another data set.

noobpeng99 commented 7 months ago

You are right,I have also attempted to train TrOMR on the PRIMuS dataset, simply by scaling the images to a fixed size. My results show that TrOMR's performance does not exhibit a significant advantage, with a symbol error rate exceeding 3% on the CameraPrIMuS dataset. Can you share your test results?

I haven't calculated a symbol error rate yet. Right now, I run the inference on a small set of example images, such as https://github.com/BreezeWhite/oemer/blob/main/figures/tabi.jpg (after splitting it into single staff images) to get a feeling on how well it performs.

Is the code you are using to calculate the SER available somewhere? To get meaningful results, I'd also need another data set to calculate the SER on. Since PrIMuS is used for the training, I can't of course also use it to rate the performance of the results. At least for monophonic examples, it shouldn't be too hard for me to find another data set.

I will open-source my code once everything is ready, but currently, it's still under development. You can calculate the symbol error rate by measuring the edit distance between the predicted sequence generated by the computational model and the ground truth. You can use the command 'pip install editdistance' to install the tool for calculating the edit distance.

Regarding the dataset, I trained the model using approximately 60,000 images from the PrIMuS dataset and then tested it on around 10,000 images. I also experimented with training on a smaller scale of images and found that TrOMR may not fully demonstrate its true capabilities when the dataset size is small.

liebharc commented 5 months ago

FYI, homr now combines TrOMR trained on the grandstaff dataset with a staff detection module which is based on the segmentation models of oemer