-
### 🐛 Describe the bug
I used Hugging face training code.
I found during backward of training by using FSDP, the AllGather kernel doesn't overlap CatArrayBatchedCopy kernel. I don't know why.
s…
-
### 🐛 Describe the bug
When trying to use torch.export.export_for_training using a sample model like:
```
class SampleModel(torch.nn.Module):
def __init__(self):
super().__init__()…
-
### What is the problem?
I'm using Ray 0.7.6 + Python 3.7.3 with 45 linux machines on my university network as a cluster. All students in my department have access to these machines and use them …
-
I am trying to run this code in distributed tensorflow mode and have modified the code accordingly (i.e. using MonitoredTrainingSession and so on). But trying to use monitored training session doesn't…
-
Hi there! I am really interested in your repository and thanks for your efforts to ```latent-gan```.
However, I am facing a problem while I am training through the entire process by executing ``` p…
-
We need to finalise the proposed budget lines. There was some discussion on whether we should remain high level (so it is more relatable) or low level (so it is more informative). @fretchen Ronald doe…
-
Hello, using OneTrainer on Windows 11. I remote into my workstation from a laptop.
I was running a fine tune job overnight, and in the morning the UI was in this transparent mode. I can still inter…
-
def issueKey = issue.key
def division = issue.fields.customfield_14275?.value
def departmentPSD = issue.fields.customfield_14453?.value
// Define the new request participants array with account …
-
Hi,
I want to train a fasterrcnn_resnet50_fpn_v2 model on a custom dataset. I want to start from COCO pre-trained weights. Is that the default behavior?
or Do I need to supply a weights file thru…
-
### 🐛 Describe the bug
I have integrate our new accelerator to torch by utilizing torch.compile feature. And it works good for inference. Now am working on giving support for training but it gives m…