-
Hi I have a script that runs with the DataParralell trainer on a machine with 8 H100 GPUs (aws p5 VM) with deepspeed. When we run the script it starts to randomly get stuck forever at some iteration r…
-
In our previous meeting, Jeremy mentioned that emll now automatically handles missing data.
I'm using emll right now, and it's complaining about array shape mismatches, but I have no idea where thes…
-
Hello, thanks for your great work. When exploring the code, I found something confusing and want to sure whether it is a bug.
In https://github.com/FoundationVision/OmniTokenizer/blob/main/OmniToke…
-
### Description
When trying to use optax.MultiSteps on a data-parallel setup with shard_map, I am getting the following error:
```
NotImplementedError: No replication rule for cond. As a workar…
-
` if training_args.do_train:
model.gradient_checkpointing_enable()
model.enable_input_require_grads()`
作者你好,请问这里令输入也计算梯度的目的是什么呢?是否冗余了这条代码,还是说这个一…
-
Debugging your code, i have not found the code about your clone and split algorithm, where is it??
In /scene/gaussian_model.py, line 492 is the function
def densify_and_split(self, grads, grad_thres…
-
On linux on Peregrine, to get things to compile, I needed to add the flag -fPIC:
gfortran -c -fPIC adBuffer.f
gcc -c -fPIC adStack.c
Although I am not sure what it means, maybe it is something to ad…
-
I first freeze the model parameters:
`for param in self.parameters():
param.requires_grad = False
`
Then, I unfreeze the parameters of the new layer that I want to train:
`for param in ne…
-
Thanks so much for the great codes!
I was checking the reptile code, and it appears I need to set track_higher_grads=True in the context for this to run.
Is there something I am missing here? Th…
-
### What you have to share
adding a field to each hackathon to denote participant eligibility (active undergrad/grad only, 12 months after graduation, etc.) might help new grads or those who have alr…