-
## Description
The performance of multi-machine training is greatly affected by this MXNET_CPU_WORKER_NTHREADS variable. It seems that MXNet will auto-tune this variable, and it can lead to very diff…
-
So, the error message is completely accurate. The score files in the "predict_results" folder are all zero bytes, but it doesn't give me an idea why.
first 1000 lines of log (with the SVision call)…
-
The training time becomes longer when I run the second job in a multi-GPU cluster.
![image](https://github.com/georghess/neurad-studio/assets/38983719/f757e935-eee5-4c32-ba5a-d5970b1e36d8)
And the…
-
In PR #407 we added a feature to modify the build time dependencies of a package. In a few cases, we also want to modify the installation time dependencies (`Requires-Dist`) of a wheel. Sometimes we w…
-
### Bug Explanation
I'm trying to create demos using ivy framework to build a CNN Model for plant disease detection. Everything is Working fine on CPU as intended except that when device is set to gp…
-
I used this project to retrain on DNS-challenge dataset. But When I finished the training model, I tried to convert the model using convert_weights_to_tf_lite.py, but found that the model I converted…
-
**What**
Allocate workers based on number of train vs. validation steps, so that the validation workers don't pull way ahead of the train workers.
**Why**
Better use of cpus for faster training.
…
-
When I'm reading the pretraining code, the comment says `drop_remainder` should be `true` when it is training and should be false when evaluating, but the code confuses me a lot. I am not sure whethe…
-
I just installed 1.8.17 on a new system and am going through the test pipeline fixing errors.
The current one though I am not sure what to do about.
It happens during BUSCO-mediated training. Ho…
-
Hello
I have models trained using multi-core TPU option.
I have saved their checkpoints using
```
import torch_xla.core.xla_model as xm
xm.save(checkpoint, path_checkpoints_file, master_only…