JohnGiorgi / DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!
https://aclanthology.org/2021.acl-long.72/
Apache License 2.0
379 stars 33 forks source link

Training with multi gpus #263

Open NtaylorOX opened 1 year ago

NtaylorOX commented 1 year ago

Great work.

I saw in another issue there had been plans to migrate to later versions of allennlp?

I have actually got a version of this DeCLUTR code working with allennlp v2.10 - however cannot get the multi-gpu setup to work, as the config arguments seem to have changed and I cannot seem to find out how.

For instance, using overrides with: "distributed.cuda_devices"

leads to: ValueError: overrides dict contains unused keys: ['distributed.cuda_devices']

I imagine this project may have become a bit too old to keep working on, but any help with mutli-gpu training with allennlp v2.10 in relation to declutr would be great.

Best,

Niall

NtaylorOX commented 1 year ago

Fixed it ... Afterall I stumbled upon the subtle change required for the config:

Honestly thought I had tried this several times, so perhaps it was fatigue. But the following works for multi-gpu with allennlp v2.10. It is a subtle change from v1.1

"distributed": { "cuda_devices": [8,9], }, "trainer": { // Set use_amp to true to use automatic mixed-precision during training (if your GPU supports it) "use_amp": true,
"optimizer": { "type": "huggingface_adamw", "lr": 5e-5, "eps": 1e-06, "correct_bias": false, "weight_decay": 0.1, "parameter_groups": [ // Apply weight decay to pre-trained params, excluding LayerNorm params and biases [["bias", "LayerNorm\.weight", "layer_norm\.weight"], {"weight_decay": 0}], ], }, "callbacks":[{"type":'tensorboard'}], "num_epochs": 10, "checkpointer": { // A value of null or -1 will save the weights of the model at the end of every epoch "keep_most_recent_by_count": 2, }, "grad_norm": 1.0, "learning_rate_scheduler": { "type": "slanted_triangular", }, },
}

JohnGiorgi commented 1 year ago

Hi @NtaylorOX, does this work without any changes to this codebase? I started migrating this to allennlp>2.0.0 a while back but ended up giving up because every breaking change I fixed seemed to be followed by another.

NtaylorOX commented 1 year ago

Hi @JohnGiorgi,

So I did have to make a few changes - in line with the guidance found here: https://github.com/allenai/allennlp/discussions/4933.

Whilst I seemed to have been successful in modifying the DeCLUTR codebase to work with allennlp v2.10, it has involved a couple crude/less than ideal changes from me. I was trying to get it to work on both windows and linux, which was a bit of a pain. I think I end up commenting out an assertion somewhere to get it to work.... At least now allennlp isn't changing the codebase.

I have wanted to take the time to make it much cleaner/robust to submit a pull request.

If it would be helpful for you, I can submit one anyway, or just share the code with you directly.

Let me know how you want to proceed

JohnGiorgi commented 1 year ago

Hi @NtaylorOX, yeah would definitely be interested in an update that works on AllenNLP > 2.0. I think the big thing for me to merge it would be a demonstration that models are trained to the same loss and downstream performance

NtaylorOX commented 1 year ago

Hi @JohnGiorgi . Sorry I ended up so quiet on this, got swamped with other things...

I am still planning to find a day to action on this - am also beginning to migrate the functionality of DeCLUTR to the transformers library directly, just to make using your awesome architecture/algorithm hopefully more straight forward with what seems to have become the library of choice for NLP work.

Will try to keep you posted on both fronts.

Thanks

JohnGiorgi commented 1 year ago

Wow that sounds great! Yeah keep me updated and let me know if you have any questions / there's anything I can help with.