How many epochs were trained?

daegonYu commented 2 weeks ago

It says that pretraining is 25,000 steps and finetuning is 6000 steps for warm up only. Can I know the number of learning epochs for pretraining and finetuning including warm up? I have taken part of the paper below.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

B Implementation Details
B.1 Experimental Hyperparameters

For the pre-training with the massive unsupervised data, the max length of query and passage is set to 512 and 8192, respectively. The learning rate is 5 × 10−5 , the warmup ratio is 0.1 and the weight 8. decay is 0.01. This training process takes 25,000 steps. For training data with different sequence length ranges (e.g., 0-500, 500-1000, etc.), we use different batch sizes. The details are represented in Table 9. The second stage is conducted on 96 A800(80GB) GPUs. In the fine-tuning stage, we sample 7 negatives for each query. Refer to Table 9 for the batch size. In the initial phase, we employed approximately 6000 steps to perform warm-up on dense embedding, sparse embedding and multi-vectors. Subsequently, we conducted unified training with selfknowledge distillation. These experiments were carried out on 24 A800(80GB) GPUs

hanhainebula commented 2 weeks ago

Hello, @daegonYu. We performed pretraining for 1 epoch. For the finetuning process, we performed 2 epochs for warm-up (1 epoch for dense embedding, 1 epoch for sparse embedding and multi-vectors), and 1 epoch for unified-finetuning.

daegonYu commented 1 week ago

You said you warmed up each epoch with 1 epoch for dense embedding and 1 epoch for sparse embedding and multi-vectors. Do you provide code to learn dense embedding and sparse embedding and multi-vectors separately?

hanhainebula commented 1 week ago

We provide the corresponding arguments to achieve this:

https://github.com/FlagOpen/FlagEmbedding/blob/a6069c10740e1198806ca638516e27caa7cde4ad/FlagEmbedding/BGE_M3/arguments.py#L96

https://github.com/FlagOpen/FlagEmbedding/blob/a6069c10740e1198806ca638516e27caa7cde4ad/FlagEmbedding/BGE_M3/arguments.py#L98

For more details, you can refer to our code.

daegonYu commented 1 week ago

Oh thank you!

FlagOpen / FlagEmbedding

How many epochs were trained? #942