-
I reinstall `pip install flash-attn==2.6.1` in NGC pytorch docker image 24.06.
When I run train job, I got follow error:
```
Traceback (most recent call last):
File "/data1/nfs15/nfs/bigdata/zha…
-
https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#14-appendix-benchmark-specific-rules
Here, it is stated that feature caching is not allowed. What is the definition of…
-
**What would you like to be added/modified**:
Combines KubeEdge with its subproject [Sedna](https://github.com/kubeedge/sedna), and integrates [Volcano](https://github.com/volcano-sh/)'s schedu…
-
Based on the Vanna.ai model, the Q&A performance in a specific domain will be very good after training. However, I am calling the API of my locally deployed model. After training data on this API, I…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
…
-
The current repository provides comparisons of various AI language models. However, there are several recent models and unique architectures that are not included in the existing comparisons. Expandin…
-
### Description
**Summary**: The ONNX domains and their versions are not appropriately set after converting a `RandomForestClassifier` model by specifying a specific `target_opset`.
**Expected Beh…
-
I'm using longhorn v1.6.0
and I create the volume with replica 2
I am training an AI model by reading image files from a Longhorn volume, and recently, the training often hangs unexpectedly.
I …
-
### 1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Google Colab with Python 3.10.12
- TensorFlow installation (pip package or built from source):
pip package
-…
-
I am attempting to use the fine tuning with my custom dataset, however the training percentage value keeps staying at 0% and not increasing at all, after 20h of running time:
```
Train: 0%| …