-
As of #485 workers can spill excess data to disk. They currently evict tasks from a fixed pool of memory to disk with a simple LRU policy. Task results are stored on disk as single files. This is c…
-
```
I did two optimizations on real production servers, and load average
dropped from >16 (number of threads of AUFS) to 2.0-2.50
src/fs/aufs/store_io_aufs.c
if (aiostate->flags.close_request)
…
-
> This paper presents the application of the Swept rule for two-dimensional grids on heterogeneous architectures.
The Swept rule was introduced in 2016 (ref. [1]), extended to two-dimensional grids i…
-
### What happened?
I was able to get `onnxruntime-training 1.16.1+rocm56` from [onnxruntime.ai](https://download.onnxruntime.ai/onnxruntime_stable_rocm56.html) and it includes `ROCMExecutionProvider`…
-
Please improve the fine-tuning script!
After I solved this problem:
```
Traceback (most recent call last):
File "E:\OmniGen\train.py", line 371, in
main(args)
File "E:\OmniGen\train.py"…
-
**Describe the bug**
When i try to convert a neox trained LLAMA model (config below) with [convert_neox_to_hf.py](https://github.com/EleutherAI/gpt-neox/blob/main/tools/ckpts/convert_neox_to_hf.py) i…
-
## 🐛 Bug
Running a model parallel 2 with 8 gpus on FAIR cluster raises the following exception with the 1.3B_gptz model only when run with `arceasy`, `arcchallenge`, `openbookqa`. Works with `storycl…
-
I trained the model "nat_ctc_sd_ss" with the command in the README.md on Tesla V100 GPU, but i got **Out of memory** problem. Is there anything to be changed?
My train command:
```shell
python3 tra…
-
Follow the steps below:
``` SQL
CREATE TABLE test_1 (key int, dummy int);
CREATE TABLE test_2 (key int, dummy int);
SELECT master_create_distributed_table('test_1', 'key', 'hash');
SELECT master_crea…
-
```
I did two optimizations on real production servers, and load average
dropped from >16 (number of threads of AUFS) to 2.0-2.50
src/fs/aufs/store_io_aufs.c
if (aiostate->flags.close_request)
…