-
https://stackoverflow.com/questions/64199384/tf-keras-model-predict-results-in-memory-leak
Please make sure that this is a bug. As per our
[GitHub Policy](https://github.com/tensorflow/tensorflow/…
-
I am using optimum neuron run_qa.py to fine-tune GPT2, by looking at the output it seems like it does data parallelism.
Kindly confirm what kind of parallelism is done?
If I enter 8 as batch size it…
-
### System Info
- `transformers` version: 4.36.2
- Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.20.2
- Safetensors version: 0.4.1…
-
Questions regarding parallelism:
1. If I'm not mistaken, both tensor serialization & deserialization operations should be parallelizeable. Is this assumption correct? For example, I was thinking th…
-
### Feature request
Enable TGI load qlora finetuned model with optimized architecture on Sagemaker. Right now the optimized architecture is active only for certain models on the list. If the features…
zkdtc updated
5 months ago
-
Started full-time thesis around april/may 2023.
Track DST, Q3/4 start. Still "seminar course" ToDo. Has superapp/MusicDAO experience. Discussed as diverse as digital Euro and Web3 search engine (un…
-
I am trying to use `meta-llama/Llama-2-13b-chat-hf` witch have a `max_position_embeddings` of 4096 tokens.
I found that the library fails in a non-deterministic way when input length is between 1790 …
dennj updated
4 months ago
-
LM head weights get untied during training even when they are supposed to be tied.
This is happening when overlap parameters are set to true.
cc: @deepakn94
-
I am trying to quantize a custom fine-tuned llama2 model using the following code:
```
from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, Ba…
-
### What happened?
I am running Llama3 8B Instruct, but the model output doesn't make sense. I followed the general guidelines of the [main (cli)](https://github.com/ggerganov/llama.cpp/blob/master/e…