Open Anwar-Said opened 2 years ago
Same problem here. I've printed the output scores of the learned model with test data = train data, and it turns out that all scores are above 0.5 (i.e. all being classified to 1). One thing I noticed was that examples with label 0 tends to have scores less than 0.51, and those with label 1 tends to have scores above 0.51.
After months reviewing the data, I can tell you that the partial dataset is filled with errors - if not intentional. For example, in ffmpeg 27295 the method av_get_pixel_fmt() has been later changed to av_get_pix_format() which results in a different graph structure. The fact that so many people can't reproduce the results makes me think Devign is worth throwing out of related works. I will be switching to SARD and only SARD from now on. :)
@NikolasBielski Hi,where can I find the dataset SARD you have mentioned?thanks!
@NikolasBielski Hi,where can I find the dataset SARD you have mentioned?thanks!
You can find it at https://samate.nist.gov/SARD/.
Hi, I'm attempting to reproduce the paper's results, particularly using the provided datasets. On one dataset (under data/input/), I ran the model with the identical experimental settings as described in the paper. However, the model learns nothing. I was hoping you could help me figure out what I'm missing. thanks.