Great work, thank you! I am encountering the following issue: When I follow your retnet_machine_translation.ipynb to train retnet on Ubuntu with CUDA, I achieve the same quality as you reported. However, when I train it on a MacBook Pro with an M3 Max chip, the model trains without errors, but the quality of the results is significantly worse. The only alteration I made for training on Apple Silicon was setting torch.set_default_device("mps") to use Metal Performance Shaders (MPS) as a backend, instead of torch.set_default_device("cuda"), as specified in the notebook, and setting up the dependencies outside the notebook; I made no other modifications. Do you have any idea why there is a discrepancy in the quality numbers?
Thank you for your comment, I did not think of this (I don't have a mac to test out this). However, I searched the internet for this issue of yours and I believe it is because of the overhead of MPSGraph. I suggest that you go to this and this link for a more in-depth knowledge of your issue.
Great work, thank you! I am encountering the following issue: When I follow your retnet_machine_translation.ipynb to train retnet on Ubuntu with CUDA, I achieve the same quality as you reported. However, when I train it on a MacBook Pro with an M3 Max chip, the model trains without errors, but the quality of the results is significantly worse. The only alteration I made for training on Apple Silicon was setting torch.set_default_device("mps") to use Metal Performance Shaders (MPS) as a backend, instead of torch.set_default_device("cuda"), as specified in the notebook, and setting up the dependencies outside the notebook; I made no other modifications. Do you have any idea why there is a discrepancy in the quality numbers?