-
While using the continued pretraining method with the Llama 3.2 1B model, I'm encountering an 'OutOfMemoryError: CUDA out of memory.' I've already set the batch size and other parameters to their lowe…
-
Thank you for your outstanding work, but I still met many problems in the process of reproducing the pre-training results.
I use the following command to pre-train the groundingdino_swint:
bash …
-
Hai An,
Thanks for your ETPNav's open-sourced codes. ETPnav is a fascinating work! I open this issue to ask where to find the feature processing codes for pertaining. Or will you plan to open this …
-
Hello! I'm very interested in your great work! I have two questions about pretraining.
Does the generalization ability of UMT come from CLIP? With this in mind, regardless of what kind of pre-traini…
-
### 起始日期 | Start Date
_No response_
### 实现PR | Implementation PR
_No response_
### 相关Issues | Reference Issues
_No response_
### 摘要 | Summary
我想在我自己的domain data上做post pretraining. 可以给出你们的预训练代码吗…
-
```
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokeniz…
-
You mentioned in the previous issue we can load pretrained and convert conv2d to partialconv. How would you change it as the model structure is fixed in pretrained models? My model is
```
`class …
-
## Background
*relevant information and motivation for this task*
The infinigram paper claims to provide an efficient method for computing pretraining term frequencies.
https://arxiv.org/pdf/2401.173…
-
Hi,
My team and I are trying to duplicate the results of your paper, but cannot. Would it be possible to gain access to the code that pretrains the data? That would help us a lot. Thank you.
-
- Only depend on standard ImageNet files
- Facilitate swapping in a different dataset
Related issues:
- #233
- #126
- #100
- #72