-
Hello, @b4rtaz!
I'm trying to run model [nkpz/llama2-22b-chat-wizard-uncensored](https://huggingface.co/nkpz/llama2-22b-chat-wizard-uncensored) on a cluster composed of 1 Raspberry Pi 4B 8 Gb and 7…
-
curious how it performs on smaller models
-
### What happened?
I am trying to run inference on RPC example. When running the llama-cli with rpc feature over a single rpc-server on localhost, the inference throughput is only 1.9 tok/sec for lla…
-
see tinyllama pretraining script in lit-gpt, pytorch-labs repo from torch talk
-
Are there any differences in the _make_masks function across different LLM models? Don't they all compute loss only for the response part? What causes the variations among them?
-
[TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0). This is my custom dataset: [BibleGPT-LORA](https://huggingface.co/datasets/oliverbob/biblegpt). Its a s…
-
I try to load the model with transformers,
`
small_model = AutoModelForCausalLM.from_pretrained(approx_model_name,
torch_dtype=torch.float16…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
-
When I tried to call :
```python
llm = NanoLLM.from_pretrained(
model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
api='hf',
api_token='mytoken',
…
-
I tried to simplify TinyLlama with the code, but the simplified onnx file is almost with the same size with non-simplified one. It is appreciated if you can provide onnx sizes of the original Llama on…