The default out_dim is 65536. I trained using 65536 on a custom dataset but the loss did not decrease, or decrease very slowly. If I chose a smaller out_dim, the loss seemed to converge much faster. It is okay/recommended to use a smaller out_dim, let's say change from 65535 to 1024, 2048, etc.
The default out_dim is 65536. I trained using 65536 on a custom dataset but the loss did not decrease, or decrease very slowly. If I chose a smaller out_dim, the loss seemed to converge much faster. It is okay/recommended to use a smaller out_dim, let's say change from 65535 to 1024, 2048, etc.