-
## Goal
The goal of this issue is to determine why we segment the Optimizer Code. We will likely need to determine an appropriate solution with @wujingyue after determine why the segmentation is occ…
-
I'm trying to apply deepspeed stage 2 to stylegan2 but I get this error.
Here's my config:
```json
{
"train_batch_size" : 4,
"optimizer": {
"type": "Adam",
"params": {
"l…
ghost updated
9 months ago
-
**Describe the bug**
A clear and concise description of what the bug is.
Thank you for reading my issue. I'm trying to use ZeRO-Infinity on AWS EC2 (g4dn.metal, 8GPUs). Model parameter is 40B. Con…
-
I have a tf 2.10 3DConv ANN with multiple regression outputs (model architecture at the end). I am attempting to use this package to generate gradcam++ heatmaps and I am getting the following error:
…
-
Look at the following code example, y_dist.trainable_variables is empty. But if declare the training parameter as a list (beta parameter here) it works.
-
## Summary
调用oneflow.autograd.grad()进行求导,参数is_grads_batched=True,当output有多个输出时,运行报错
## Code to reproduce bug
```
import torch as torch_original
import oneflow as flow
from typing import Tu…
-
in file "paramservermodel.py"
int function def train(self, labels, features):
# for i in range(len(self.compute_gradients)):
# self.gradients[
# i] += self.compute_gr…
-
When I read 'special_grads.py', I wondering why the second derivative is the gradient of the pooling operation.
-
Using the latest main to train a YoloV9e object detector:
```
[rank0]: train_one_epoch(train_loader, model, args, model_dtype)
[rank0]: File "/mnt/dingus_drive/catid/train_detector/train.py…
-
Hi, i was trying to implement the update method for laplaceredux but I am having a problem.
this is the model
```
MLJBase.@mlj_model mutable struct LaplaceRegressor 0)
batch_size::Intege…