-
Hi,
I am looking for ImageNet pretrained weights of the YOLOX backbone. I am specifically interested in the largest model YOLOX-x. In a couple of other issues I've seen that nano version can be train…
-
Hi, thx for your work! Do you plan to release the pretraining code? Like training dataset.
-
Hi Siqi,
Thanks for releasing the great code.
I cannot find the pretraining code in this repository about the implement of the in-batch negative examples? Could you point out it?
And it seem…
-
Hi,
Thank you very much for the great work, and for making your code publicly available.
I am trying to run the code to reproduce the results, however, the pre-training datasets are missing from …
-
Hey,
I am trying to train Funnel Transformer with the following hparams, the cpu usage for my TPUv3-8 has not gone above 4% in the 90 hours the code has been running and it seems to be very slow, to…
-
In Table 3 & 4, is the same dataset used during pre-training and fine-tuning? Or does the fine-tuning only happened on ImageNet-1k dataset?
-
**Describe the bug**
Runing the Pretraining *BERT* encountered two issues:
1. the "TransformerEngine only supports softmax compute in FP32". Need to add `--attention-softmax-in-fp32` to the model ar…
-
I think there could be value in creating a separate dataset for pretraining. It would cover the same chemical space as the standard SPICE dataset, but have many more conformations and be computed at …
-
Recently, I have been conducting applied research on Target Speaker Extraction, but I have encountered many difficulties. I came across your paper titled 'Generative Speech Foundation Model Pretrainin…
-
Unsloth is not supported with cuda 12.4. Is there are any alternate methods to use unsloth with cuda 12.4. Also are there any other frameworks supported with cuda 12.4 for continual pretraining of llm…