-
I tried to replace the attention implementation here
https://github.com/huggingface/transformers/blob/238b13478df209ab534f2195a397dc64a3930883/src/transformers/models/llama/modeling_llama.py#L419
…
-
### 🐛 Describe the bug
Hello, I am working on a project where I need to use multiple consecutive instances of DistributedDataParallel (DDP) within the same torch.distributed environment. In my scen…
-
hello. did you also experience that the basketball weights don't detect anything at all? is this the reason that "basketball" is not part of the input choices for the model param?
-
Hi,
Great work! Could you share your trained model weights? Thanks!
-
Hi, I wanna test the pre-trained model performance, but I meet some problems. I would appreciated if you can help me!
The First problem is, **ModuleNotFoundError: No module named 'data.single_datas…
-
A couple separate issues:
1. PyTorch Lightning module uses "on_train_epoch_end( )", so "train_epoch_end( )" isn't actually called.
2. Instantiating LitEma with LitEMA(self, ...) will lead to incor…
-
There is a model called SOLAR. This model follows the same architecture as LLaMA2, but it has more layers which make it outstanding performer better than Mistral and even Mixtral at some points (open …
-
Right now using ifbo, we get very many annoying errors relating to loading the model within `ifbo/surrogate.py::__init__.py`
```python
/home/skantify/code/neps-cli/.venv/lib/python3.10/site-packag…
-
### Model description
no
### Open source status
- [X] The model implementation is available
- [X] The model weights are available
### Provide useful links for the implementation
_No response_
-
Hi, thank you very much for your solid work, can you share your model weights file, mistral-7b-v0.2 is dead.