Open modestcigit opened 11 months ago
Hi modestcigit - seems this issue has been open for a while without a response. If you are still interested in getting this model to work on a trn1 instance I would suggest two things: 1) we make approximately monthly Neuron SDK releases so download the latest version to see if you can reproduce the issue; 2) if the issue is still seen in the latest release then I would suggest trying the --enable-saturate-infinity
compiler flag when compiling your model.
Got the run_clm.py to compile on trn1.32xlarge and also run the actual training. However, it shows loss-NaN and perplexily NaN results. has this been observed? The directions I followed are from here