I'm trying to train ALBERT on news article crawls which has been filtered aggressively (a lot of sentences might have been skipped) and there is also no separation of the articles in the final corpus. In this case, do you think removing SOP loss would be a better idea? If yes, how do I go about doing this with your code?
Hi,
I'm trying to train ALBERT on news article crawls which has been filtered aggressively (a lot of sentences might have been skipped) and there is also no separation of the articles in the final corpus. In this case, do you think removing SOP loss would be a better idea? If yes, how do I go about doing this with your code?