-
-
Excellent work! I just wonder if there is any way to load the model in more than one GPU because even the 7B model consumes more than 20G memory, which is larger than memory of one GPU.
-
### Motivation
在vllm部署推理时,基于kv cache的长度限制,很可能会出现如下情况:
> ValueError: The model's max seq len (19008) is larger than the maximum number of tokens that can be stored in KV cache (3840). Try increas…
-
With #168 this is surprisingly close, besides all the things that GeoData does that xarray can't do.
But clearly missing are:
- [ ] Dask-like processing for larger-than-memory files. This should a…
-
### System information
Type | Version/Name
--- | ---
Distribution Name | Proxmox VE (Debian GNU/Linux 12 (bookworm))
Distribution Version | proxmox-ve 8.2.4
Kernel Version | Linux erp…
-
> One should never rely on the number of bytes actually allocated corresponding to the number requested.
The number of bytes allocated is guaranteed to be the same (or more? I guess it's rounded up…
-
An interesting and counterintuitive observation we should make is that trying to achieve the highest possible levels of compression for call_genotype is actually pointless. From @benjeffery's experime…
-
**Severity**: Medium
**Vulnerability Details**:
Even after fixing the dynamic size allocation, there is a bug where retData is still pre-allocated to a fixed size (2 * 32 bytes). This allocation s…
-
Hello, @Snosixtyboo @ameuleman my device is 4090 24G.
First,when using the SIBR viewer to view my trained model (model size is 4G), I found that the gpu memory is about 22G, if this is the case, if…
-
### What is the issue?
**Description:**
I encountered an issue where the **LLaMA 3.2 Vision 11b** model loads entirely in CPU RAM, without utilizing the GPU memory as expected. The issue occurs on m…