Closed hnuhyuwa closed 1 year ago
Hi, Please refer explanation here.
PTv2m2 disable PEM & Gouped Linear to reduce the memory cost, make our model available on 4 * 24G GPUs, and we reproduced a better performance.
The original PTv2 we reported in paper is PTv2m1, the config is also available in model zoo.
I hope our PTv2 can be more available to most of us, consequence, combining my latest experiment results, I retune the model to reduce the cost.
Thank you for your reply.
Hi, thank you for your code. I find that the 'pe_multiplier' is set to 'False'. Do you apply this multiplier during training?