Closed Devoe-97 closed 12 months ago
Hello, note that that SF-XL README says that
3) small: this is a small curated subset of processed which allows you to quickly get started and is only 4.8 GB heavy. Obviously, results won't be as good as when using the processed version, but should be good enough. There is a train, val and test set. The train set is only from 1 group, obtained with L=12. To train on this dataset, you should do
$ python train.py --groups_num 1
. So training with other configurations, like changing values ofM
,alpha
,N
,L
, andgroups_num
will produce unpredictable results.
Thanks, I'll try again.
Hello, note that that SF-XL README says that
- small: this is a small curated subset of processed which allows you to quickly get started and is only 4.8 GB heavy. Obviously, results won't be as good as when using the processed version, but should be good enough. There is a train, val and test set. The train set is only from 1 group, obtained with L=12. To train on this dataset, you should do
$ python train.py --groups_num 1
. So training with other configurations, like changing values ofM
,alpha
,N
,L
, andgroups_num
will produce unpredictable results.
Would it be correct to use the following parameters when training with PROCESSED data?
M: 10 alpha: 30 N: 5 L: 2 groups_num: 4
Hello, note that that SF-XL README says that
- small: this is a small curated subset of processed which allows you to quickly get started and is only 4.8 GB heavy. Obviously, results won't be as good as when using the processed version, but should be good enough. There is a train, val and test set. The train set is only from 1 group, obtained with L=12. To train on this dataset, you should do
$ python train.py --groups_num 1
. So training with other configurations, like changing values ofM
,alpha
,N
,L
, andgroups_num
will produce unpredictable results.Would it be correct to use the following parameters when training with PROCESSED data?
M: 10 alpha: 30 N: 5 L: 2 groups_num: 4
Correction: groups_num=8
Thank you for your excellent work! I am very impressed with your SF-XL dataset! However, the dataset is too large and bad network makes it difficult for me to download the full dataset. (This problem may only be encountered by Chinese scholars.) Therefore, I used the SMALL set for training and validation and my configuration is as follows:
M: 10 alpha: 30 N: 5 L: 2 groups_num: 4
I encountered some confusion:iterations_per_epoch
set to 10,000), and found that the downward trend in model performance appeared earlier for epoch 20 (bs=64) and epcoh8 (bs=128). Have you explored the effect of different batch sizes on performance?I would greatly appreciate it if you could help me with the above issues.