Closed okamiRvS closed 1 month ago
Hello!
Judging by the PyTorch Getting Started, CUDA 12.4 support still seems to be premature/not fully ready.
Personally, I am also on Torch compiled with CUDA 12.1. My recommendation would be to install that instead, and see if you have more luck there. That prevents you from having to upgrading your Linux kernel (which Sentence Transformers would rather not make mandatory).
I no longer get the error, but I just reinstalled the same environment but recently. Thanks for the support.
Issue:
Description
I've been following the official tutorial to finetune an embedding model using Sentence Transformers v3. While setting up the training as described, I encountered a critical warning related to the kernel version that may affect the training process.
Error Details
When initiating the SentenceTransformerTrainer, the process detects an incompatible kernel version which is below the recommended minimum, potentially causing the training to hang:
Environment Details
I set up my environment with the following commands:
Questions
Additional Information on CUDA Compatibility
Based on the NVIDIA CUDA Installation Guide for Linux 12.4, the "Native Linux Distribution Support in CUDA 12.4" table explicitly lists support for Red Hat Enterprise Linux 8.y (where y ≤ 9) with kernel version 4.18.0-513. This appears to create a strong inconsistency because, while NVIDIA supports this kernel version for CUDA 12.4, the Sentence Transformers training process recommends a kernel upgrade to at least version 5.5.0 to prevent potential issues.
This discrepancy is concerning as it suggests a possible incompatibility between the recommended setups for CUDA and the Sentence Transformers library. It's crucial to clarify whether the kernel upgrade recommendation can be reconciled with NVIDIA’s supported configurations, or if there are additional settings or modifications recommended for users in similar environments.
Could the documentation or the error messaging in the Sentence Transformers library be adjusted to address this potential conflict, providing clear guidance for users operating under NVIDIA’s supported kernels?
Thank you for any advice or updates you can provide to help address this kernel version issue!