Based on the results you showed in gits, it looks like losses from training and valid is diverging instead of converging as number of epoch increasing. So, what's the problem here? I saw other text stigmatization model, they build their own dictionary and input something like one-hot encoding vector for each word. Does this process make difference? Thank you!
I'm glad if you want to talk more by WeChat (ID: XHansonX)
Based on the results you showed in gits, it looks like losses from training and valid is diverging instead of converging as number of epoch increasing. So, what's the problem here? I saw other text stigmatization model, they build their own dictionary and input something like one-hot encoding vector for each word. Does this process make difference? Thank you! I'm glad if you want to talk more by WeChat (ID: XHansonX)