-
I changed the backbone from vgg to resnet50. First, I have to use very low learning rate otherwise training diverges. Second, mAP is usually lower than vgg backbone adaptive teacher even if I train lo…
-
| Team Name | Affiliation |
|---|---|
| Sharks | epfl; epfl; epfl |
- Paper: [Hyper-Regularization: An Adaptive Choice for the Learning Rate in Gradient Descent](https://openreview.net/pdf?id=S1e_ss…
-
Hi,
I have used the captioning nodes and they worked fine, but when I try to run the lora node, I get the below issue. There seems to be an issue with getting it to recognise the checkpoint. From …
-
I was in contact with Victor Lafargue who suggested I ping @danielhanchen for all cuML t-SNE-related questions. So here it goes. It's three suggestions, including some questions.
1) Recent research…
-
edit: why do they let you post blank issues.
Here's a minimal example of what I was running into with the adaptive control notebook, which I got around by using a small learning rate and scaling th…
-
I am trying to create a GPU-based environment where a model is being trained say resnet18 where the number of environment can be greater than 1. I am not familiar with Jax but I am planning to learn i…
-
In certain training scenarios, I see extremely spiky cost trajectories through training. I bet this could be solved (at least partially) by implementing adagrad or some other adaptive learning rate sc…
-
Implement quick & dirty single-node version of FTRL with L1 regularization and adaptive learning rate: https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf
First step: all features (…
-
When training batch size 4 on H100 the speed is 1.27 second / it
When training batch size 4 on 2x H100 the speed is 2.05 second / it
So basically we almost got no speed boost from multiple GPU t…
-
Add descriptions to the Parameters Appendix for Deep Learning parameters:
pretrained_autoencoder
overwrite_with_best_model
hidden
epochs
train_samples_per_iteration
target_ratio_comm_to_comp
…