-
Hello,
according to our discussion [here](https://github.com/Lightning-AI/lit-llama/issues/330#issuecomment-1567376696), I think `devices` should be changed in the [pretraininig code](https://githu…
-
Note, 4.1.2023: During this research effort I've been browsing, reviewing, visiting and revisiting, studying a huge amount of articles, concepts,, linked by association during browsing etc. for feedin…
-
Hey, thank you for making this data set available to the community.
I'm wondering how you estimated the token counts in the table in the README and the blogpost? In particular, do you have the corres…
-
I'm training tinyllama with 8 A40s.
Everything goes very smooth until I want to increase the micro batch size for better computation to communication ratio.
I follow the official tutorial of lit …
-
### Describe the feature
I found both the two examples will truncate text longer than max_length. So we have to segment long text to short ones. For examples/language/llama2, the codes are:
```
…
-
## ❓ General Questions
Thank you so much for this amazing project -- it's a complete game-changer when it comes to running LLM locally. I'd like to make this available to my non-technical colleague…
-
Hi everyone, I try to run the cc net using this command `python -m cc_net --dump 2023-06 --task_parallelism 20 --num_shards 5000 -l en --mine_num_processes 20 --hash_in_mem 1`. But the invalid argumen…
-
`paged_adamw_32bit` works perfectly though. Have tried this on multiple models and multiple datasets but all of them lead to same observation.
Below results with redpajama_3B_base model and on same…
-
it too tough to get start
may i ask is there any tutorial or example for this project?
for example, how can i get the pytorch et from cluster, and how to convert it to chakra et?
how to visuali…
-