Open penguinwang96825 opened 5 months ago
hi there,
I am not the one who did the HF transplant, but using the eval pipeline in this repo, you should be able to reproduce the exact result.
Quick question: where is your eval data from?
-Yuan
Hi thanks for the prompt reply. I also notice that the number of parameters are different between the one from this repo and the one from Huggingface Hub. FYI, I downloaded the AudioSet from this repo.
This data do not have problem, if you search in the issues, there are people successfully reproduce the result with this version.
The problem is likely in your eval pipeline. Which norm (i.e., mean std) did you use for eval? You should use the same as our training norm.
why not try out eval?
-Yuan
I believe the Huggingface FeatureExtractor
uses the default normalisation settings, you can check it from here, the mean is -4.2677393 and the std is 4.5689974. The thing is, I want to ensure everything is Huggingface compatible. This compatibility simplifies model evaluation, enables easier experimentation, and facilitates collaboration within the machine learning community.
I understand, and believe HF can reach the performance, it might just be a minor thing. I just do not have time to debug as I am managing multiple repos.
How about this:
https://colab.research.google.com/github/YuanGongND/ast/blob/master/colab/AST_Inference_Demo.ipynb
This is a colab for inference using our pipeline. You only need minimal effort to revise it to eval all your samples, and then you should see a mAP with our eval pipeline. You can also record the logits of each sample, then you can compare with the HF one.
You can even start from a single sample, see if our colab logits and your HF logits are close enough. And you can start from that point for debugging.
-Yuan
Hi I'm attempting to reproduce the performance metrics of models using HuggingFace's Pipeline utility, but I'm encountering different results. Below is the Python code I used for testing:
The helper functions for the metrics calculations are implemented as follows:
The recorded performance metrics were:
MIT/ast-finetuned-audioset-16-16-0.442
MIT/ast-finetuned-audioset-10-10-0.4593
These results do not align closely with the expected performance. Could you help me identify any potential issues with my approach or provide guidance on achieving the expected performance levels?