allanj / ner_incomplete_annotation

125 stars 33 forks source link

Results reproduction #9

Closed ViktorooReps closed 2 years ago

ViktorooReps commented 2 years ago

I want to use your model as a baseline in my future paper, but unfortunately I cannot uitlize the results reported in your paper as I explore the smaller range of labelled entities fraction: 5-15%.

So I will need to run experiments with your code on this range of labelled entities. Thanks a lot for an implementation in PyTorch by the way!

You have mentioned in the other issue that fine hyperparameter tuning is not required to achieve comparable results, yet I would still appreciate you sharing the optimal hyperparameters (at least somewhat close to the ones you used in your paper).

ViktorooReps commented 2 years ago

Ran with default hyperparameters and lr_decay=0.2. Got results:

The best dev F1: 77.9107505070994
The corresponding test: [96.91211401425178, 57.79036827195468, 72.40461401952085]
allanj commented 2 years ago

Thanks for the interest. I'm trying to reproduce your error as well. Did you use pretrained word embeddings?

allanj commented 2 years ago

BTW, I think the parameters in the current repo is the one that I use

allanj commented 2 years ago
image

If you look at this graph, I think the F1 is somewhat similar to what I have? Though it's a soft approach in the figure

ViktorooReps commented 2 years ago

Thank you for the reply!

To my understanding results are not really similar to yours. Here is the exact call to training script I used: python3 main.py --dataset conll2003 --variant soft --device=cuda:0 --num_epochs=20 --lr_decay=0.2. I expected to see results somewhat close to Our Soft at 0.5 $\rho$ mark. But really results that I got are more similar to Simple approach.

I did not try to run the original DyNet version of your repo. I will come back to you as soon as I get the results.

ViktorooReps commented 2 years ago

I did download the pretrained GloVe embeddings and put them under data/ folder. To my understanding the 100dim vectors were used by default.

allanj commented 2 years ago

Thank you for the reply!

To my understanding results are not really similar to yours. Here is the exact call to training script I used: python3 main.py --dataset conll2003 --variant soft --device=cuda:0 --num_epochs=20 --lr_decay=0.2. I expected to see results somewhat close to Our Soft at 0.5 $\rho$ mark. But really results that I got are more similar to Simple approach.

I did not try to run the original DyNet version of your repo. I will come back to you as soon as I get the results.

Sorry about that. But the PyTorch version is only for hard version. For soft version, I don't think it's working. (Sorry about that I didn't really make the soft approach working in PyTorch version yet.)

But the hard approach should be working

ViktorooReps commented 2 years ago

Oh my bad, I got the impression that you've finished the soft version already (from the code + issues) but didn't bother to update the project description.

I will be trying the DyNet version then. Thank you for your time!

ViktorooReps commented 2 years ago

For anyone interested I have managed to run DyNet version. You can find Dockerfile with environment setup here: https://github.com/ViktorooReps/partial_annotation. I was not able to accelerate training with GPU though unfortunately (DyNet uses the device, yet training time goes up). And CPU traning only utilizes one core...