-
Hi, @SparkJiao , I'm working on finetuning deepseek coder model (like 1b and 6.7b) based on model pipeline, as far as I know, it is based on the llama architecture. And this repo gives me great help…
-
### Describe the bug
```
let { array = [] } = $props()
let array2 = $derived(array.filter(i => !i.hidden))
```
In this short example, filter is marked red
'array' is of type 'unknown'.
…
-
when i read the code, i find that kv cache in the prefill stage not being compressed, the hidden states is compressed instead, i wonder why not compress kv cache, but compress hidden states
-
hi!
im currently working with the repo and im getting this error trying to webscrapp a website
this is the code that i used:
async with AsyncWebCrawler(verbose=False, always_by_pass_cache=Tru…
-
I'm examining a DD disk image and I'd like to view all files and folders on an NTFS partition including all system and hidden.
```
using NtfsFileSystem ddimageNtfs = new NtfsFileSystem(volume.Open()…
-
Part of the program can be hidden off stage. Lead to confusion.
Here is an example. Students said my servo isn't working. We debugged their program. But still the servo behavior was extremely odd.…
-
Significant output differences when compiling and running the `facebook/bart-base` (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision…
-
**Is your feature request related to a problem? Please describe.**
Often times one only wants to display several lines of code. Displaying these lines without context could confuse the reader. Disp…
-
Hey all, hope everyone is doing well! What follows may be bit of a dumb question, but I just wanted to clarify how this is working for my own algorithm development based on your guys' excellent code. …
-
Have you compared the SSRL with PPO? I find the following code in your code:
What's the performance between them?
```python
def make_ppo_networks(cfg: DictConfig, saved_policies_dir: Path,
…