-
I keep getting NaN losses like this after a few hours of of training:
| 1360/1000000 [7:25:23
-
It looks promising.
paper: https://arxiv.org/abs/2404.03715
Unfortunaly code is unavailable.
### Abstract (from paper):
This paper studies post-training large language models (LLMs) using pr…
-
Hello, there are three questions I want to ask you.
1) What is the purpose of the Breastclip folder? Is it a part of the project? If so, at what stage is it used?
2) I saw a .py file in Breastcl…
-
Hello!
Inspired by #19 #28, me and my fellow collaborators have also outlined a course curriculum for our section but we would like to have some inputs and feedback from the HF team before we fina…
-
-
-
The code
```
model = WhiteboxModel.from_pretrained(
"tiiuae/falcon-40b-instruct",
cache_dir="~/cache/",
device_map='auto',
offload_folder="offload_folder"
```
Throws the …
-
Creating this issue to document my observations, readings, and development efforts towards building a solution for predicting the replaced mode in the absence of inferred labels.
-
1. [TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting](https://arxiv.org/abs/2403.09898)
https://github.com/Atik-Ahamed/TimeMachine
2. [An Analysis of Linear Time Series Forec…
-
Hello, thank you for the inspiring work.
While many projects discard the non-linear projection head module after training, Hypo appears to retain it after training.
It seems that the code is based…