-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Environment
```markdown
- Milvus version: master-20241112-b5b00355-amd64
- Deployment mode(standalone …
-
**Describe the bug**
When using both the Zero++ and BFloat16 features simultaneously. Sometimes the gathered param is Float15 dtype,but the intermediate result are still BFloat16 dtype.
**To Repr…
-
Hi, if I have a linear layer the weight only has the value of {0, 1, -1}. Is it possible to utilize your kernel for weight compression and inference speed-up? My current weight is in bfloat16 format.
…
-
The following test will fail with poor PCC:
```
def test_broken_reshape(device):
src_shape = (1, 56, 56, 64)
target_shape = (1, 1, 56*56, 64)
torch_input_tensor = torch.randn(src_shape, dt…
-
### Prerequisites
- [X] I have read the [documentation](https://hf.co/docs/autotrain).
- [X] I have checked other issues for similar problems.
### Backend
Local
### Interface Used
CLI
…
-
### Feature request
Too much boilerplate template:
Resolves loading, quantization, and device
Eg. if
device: auto -> torch.cuda.is_available() -> cuda or mps.
dtype: float32 -> float32, no q…
-
![image](https://github.com/user-attachments/assets/8a63a25f-74ef-4596-a1a4-c6fc2dc48e10)
我的bm-smi版本 sophon-driver sophon-libsophon sophon-libsophon-dev 这些都是0.5.1版本的。
另外我打印了我的Bmodel,应该是没什么问题的。…
-
Hi all, I am trying to fine-tune models in extremely long contexts.
I've tested the training setup below, and I managed to finetune:
- llama3.1-1B with a max_sequence_length of 128 * 1024 tokens
…
-
### What happened?
When converting models using [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py) to GGUF format, a `TypeError` occurs if the `licens…
-
In principle, RETURNN supports arbitrary dtype, as `Data` can just have any `dtype`. However, many layers do not really allow to configure that. Most layers would just take the same as the input, so i…