-
# Research and compile a list of health related datasets with public/open licenses to store on decentralized storage
**Motivation/ background / user story:**
Open data is used by many stakeholde…
-
loading annotations into memory...
Traceback (most recent call last):
File "/sdb/liuhaolin/anaconda3/envs/pseco/lib/python3.8/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
…
-
Parallelized collections are created by calling JavaSparkContext’s parallelize method on an existing Collection in your driver program. The elements of the collection are copied to form a distributed …
-
I am trying to train YOLOX-L model on a smaller version of COCO (only images with people, bikes, cars, trucks) - stuff that you would encounter with a driverless car.
The problem is that even with a …
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
-
Investigate the suitability of the HDF5 format for representing very large (>1Tb) irregularly sampled seismic data. Compare its performance on distributed-memory machines with the current model in Mad…
-
For large datasets (say > 1 million), we should provide guidance that the dataset is large and is best viewed when zoomed in.
Where to zoom to is a different question. Viewing a million points dist…
-
In theory, seems that the jailed space will have poor performance.
Did anyone face that issue when the number of workers in Slurm cluster inrease ?
-
I write my own dataset class and dataloader, and while train with mmcv.runner, I get the error "ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2762685)". …
-
I run the training like below, but throught out an Erro: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED .
torchrun --num_processes 1 train_network.py \
--pretrained_model_name_or_path=/aigc2/liutl/m…