-
Is the frame work support multi-gpu training?
I want to use the frame work to train a 70B model, however, I did not find the parameter settings or methods for multi-gpus training.
-
**Describe the bug**
First of this is not an urgent matter since after the error the plot is still shown correctly.
The function plotter.plot_samples() results in an error when your problem definiti…
-
### Describe the bug
When using the Feedforward in `diffusers.models.attention`.
I've observed discrepancies in the results when processing subsets of the original input that vary in sequence lengt…
-
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=2304, out_features=8, bias=False)
…
-
i use this code to biuld model for captcha recognition
```c++
dataiter
-
### 🐛 Describe the bug
I've included a minimal repro script below. The key point to observe is that the module expects two arguments, `x` and `y`, whereas we provide only `x` as the dummy input to th…
-
TLDR: I couldn't make DeepExplainer show the correlation between input and output when using a softmax, the plots are below, the code is [here](https://github.com/ydib/shap_softmax_problem/blob/main/s…
ydib updated
4 months ago
-
In my speech synthesis system built from Merlin toolkit, it take long time to generate speech from text. Most of the time used by World Vocoder and DNN generation module. So, to improve time delay, I …
-
Hello, I'm trying to reproduce the test from a closed issue.
When I try to run the following:
```
python src/main.py --output_dir experiments --comment "pretraining through imputation" --name pretr…
-
Thank you for your great work! I should have found out this repository earlier.
The thing I wanna ask is as of GLU function that you've implemented.
I noticed that xformer's implementation of Sw…