-
Hi, thanks for your great work! What is the number of blocks in v1.3? is it still the same of v1.2? It seems strage not double the block number like stdit.
-
[Outline] I would like to add a section about optimizing the speed and response times from LLMs under General Concepts. I plan to include the below topics:
- Quantization
- Flash attention
- Arch…
-
I have already downloaded Flash-attention 1.x(actually flash-attn 1.0.8) because currently I only have a GPU with TURING architecture(TITAN RTX). But for my needs (running a demo of a multimodal LLM)…
-
* The terminal process "/bin/bash '-c', '/usr/local/cuda-12.4/bin/nvcc -g -G -diag-suppress=177 -lineinfo --std=c++17 -arch=sm_75 '-D CUTE_ARCH_LDSM_SM75_ACTIVATED' -o flash_attention_cutlass_standa…
-
Problem: The model generates repetitive, nonsensical outputs like "Breis" regardless of the input provided. This happens even with different generation settings (e.g., temperature, top_k, top_p).
fro…
-
Hi, it seems that unsloth currently does not support loading base model trained by [OLMo](https://github.com/allenai/OLMo). Is it possible to write custom script to load the model into unsloth? The mo…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
## Overview
The focus for this code review will be centered around the Dashboard and Landing pages.
Please pay attention too:
* Javascript issues
* React components
## Review Branch
[r…
-
This is printed when I call `functional.scaled_dot_product_attention`:
> [W914 13:25:36.000000000 sdp_utils.cpp:555] Warning: 1Torch was not compiled with flash attention. (function operator ())
…
-
The **transformer** architecture (https://arxiv.org/pdf/1706.03762) has been instrumental to scale sequence neural networks.
The transformer architecture is the fundamental building block of all LLMs…
danlg updated
4 months ago