Migrate to pytorch 0.4 - Githubissues

xiadingZ commented 6 years ago

pytorch 0.4 has released, please migrate to 0.4

srush commented 6 years ago

Yup, Hi-pri.

[ ] - Need to fix detach issue
[ ] - Need to fix softmax
[ ] - Need to remove all the Variable calls
[ ] - Try out the new multi-gpu
[ ] - get rid of the cuda calls
[ ] - Try out checkpoint

nikhilweee commented 6 years ago

Until then, I think it's a good idea to say that we still use pytorch 0.3.1 in the README? BTW, Can I start working on this?

livenletdie commented 6 years ago

Is there an approximate ETA on this? I updated to PyTorch 0.4 for a different project and now OpenNMT is breaking for me :(. I can create another virtualenv for older pytorch version if this porting is going to a while.

da03 commented 6 years ago

I made a working version, but have not fully tested it yet, in https://github.com/da03/OpenNMT-py/tree/pytorch0.4

livenletdie commented 6 years ago

Awesome thanks. Let me give it a try.

David-Levinthal commented 6 years ago

This is a duplicate of 403 https://github.com/OpenNMT/OpenNMT-py/issues/403 there are instructions there on patching the distro to run on pytorch 0.4 I have run this with Cuda 9.1 cudnn 7.1.2

nikhilweee commented 6 years ago

@David-Levinthal I think #403 is just a workaround to make the current distro work with pytorch 0.4. It only has the minimal changes. This however, is a more complete solution which involves many more changes

vince62s commented 6 years ago

The #403 works fine for the first test in test/rebuild_models.sh but the second fails Traceback (most recent call last): File "train.py", line 493, in main() File "train.py", line 485, in main train_model(model, fields, optim, data_type, model_opt) File "train.py", line 257, in train_model train_stats = trainer.train(train_iter, epoch, report_func) File "/home/moses/pytorchwork/OpenNMT-py/onmt/Trainer.py", line 185, in train report_stats) File "train.py", line 103, in report_func report_stats.output(epoch, batch + 1, num_batches, start_time) File "/home/moses/pytorchwork/OpenNMT-py/onmt/Trainer.py", line 72, in output self.n_src_words / (t + 1e-5), RuntimeError: invalid argument 3: divide by zero at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:88

David-Levinthal commented 6 years ago

Nikhilweee, excuse my confusion but the link "changes" in your comment points to this report. Are you saying I should install https://github.com/da03/OpenNMT-py/tree/pytorch0.4?

nikhilweee commented 6 years ago

@David-Levinthal By changes I meant srush's checklist https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-384132016, which happens to be the second comment on this issue. About installing @da03's fork, I think he is a better person to comment.

da03 commented 6 years ago

@vince62s maybe try to cast that to a python number first? Something like self.n_src_words.data.cpu().numpy() / (t + 1e-5)

David-Levinthal commented 6 years ago

AHHH...a bit of background I am trying to evaluate/establish a multi framework HW evaluation benchmark suite that can be used by HW vendors/startups and cloud providers (for purchase decisions) https://github.com/David-Levinthal/machine-learning/ see the csv file so I am a bit desperate on getting something working at ~ top of tree as HW startups do not have the resources to support anything else The workaround in 403 ran and produced "nice" output and appeared to converge but seems to fail to translate.

any prognosis on when a Pytorch 0.4 version might become available..or a version people would like me to test? one added comment I am trying to use the same data sets for TF/NMT, MxNet/Sockeye and Pytorch/OpenNMT but using the preprocess .py step for the files given to OpenNMT after following the preparation outlined by TF/NMT (they were the first ones with an available source base) that appears to work for sockeye

so the first two sentences in the src-val file Die Premierminister Indi@@ ens und Japans tra@@ fen sich in Tok@@ io . Indi@@ ens neuer Premierminister Nar@@ end@@ ra Mo@@ di trifft bei seinem ersten wichtigen Auslands@@ besu@@ ch seit seinem Wahl@@ sie@@ g im Mai seinen japanischen Amts@@ kol@@ legen Shin@@ zo A@@ be in Tok@@ o , um wirtschaftliche und sicherheits@@ politische Beziehungen zu bes@@ prechen .

which look like the following in the tgt-val file India and Japan prime ministers meet in Tokyo India 's new prime minister , Nar@@ end@@ ra Mo@@ di , is meeting his Japanese counter@@ part , Shin@@ zo A@@ be , in Tokyo to discuss economic and security ties , on his first major foreign visit since winning May 's election .

is this formatting usable?

d

On Tue, May 8, 2018 at 11:14 AM, Nikhil Verma notifications@github.com wrote:

@David-Levinthal https://github.com/David-Levinthal By changes I meant srush's checklist #705 (comment) https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-384132016, which happens to be the second comment on this issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-387493534, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuTzEiXLfU1pyzZxkLqo6meJPoP64-ks5tweBqgaJpZM4Tio05 .

vince62s commented 6 years ago

https://github.com/Ubiqus/OpenNMT-py @pltrdy and I fixed many things to make it pytorch 0.4 compatible + huge refactoring. RNN: LSTM GRU SRU work fine CNN: works fine Transformer: works fine

We have not tested Image and Audio datasets, if someone could do it, would be great.

TODO Fix some doc things following the refactoring implement the multi-gpu double check Variable() removals

David-Levinthal commented 6 years ago

fantastic! is this in the main tree? did you try translation inference, see issue 725? d

On Wed, May 16, 2018 at 11:23 AM, Vincent Nguyen notifications@github.com wrote:

https://github.com/Ubiqus/OpenNMT-py @pltrdy https://github.com/pltrdy and I fixed many things to make it pytorch 0.4 compatible + huge refactoring. RNN: LSTM GRU SRU work fine CNN: works fine Transformer: works fine

We have not tested Image and Audio datasets, if someone could do it, would be great.

TODO Fix some doc things following the refactoring implement the multi-gpu double check Variable() removals

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-389618219, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT9_CFIc4j1k1uVunHlIwfkodCe2vks5tzG66gaJpZM4Tio05 .

vince62s commented 6 years ago

not sure by what you mean by "main tree". it on https://github.com/Ubiqus/OpenNMT-py master. Translation ran fine for us but give it a try.

David-Levinthal commented 6 years ago

I meant https://github.com/OpenNMT/OpenNMT-py will the Ubiqus fork be merged with the one I am listing? d

On Wed, May 16, 2018 at 11:59 PM, Vincent Nguyen notifications@github.com wrote:

not sure by what you mean by "main tree". it on https://github.com/Ubiqus/ OpenNMT-py master. Translation ran fine for us but give it a try.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-389765200, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT63HJkzc5m-RRWtqz140ZI8TyT-zks5tzR_jgaJpZM4Tio05 .

livenletdie commented 6 years ago

I was trying to use https://github.com/Ubiqus/OpenNMT-py but it seems like it has diverged a fair bit from the baseline repo. In particular, no more epochs but training_steps instead. Is that something the base repo is also moving towards?

vince62s commented 6 years ago

Most toolkits use steps now (onmt-tf, marian, t2t, ...) it's more meaningful especially when it comes to big corpus.

dreamk73 commented 6 years ago

How far along is this update? I tried both the master version and da03's version and still get an error about using detach_() instead of detach(). When I change it in the code I get to the following error:

Traceback (most recent call last): File "train.py", line 418, in main() File "train.py", line 410, in main train_model(model, fields, optim, data_type, model_opt) File "train.py", line 231, in train_model train_stats = trainer.train(train_iter, epoch, report_func) File "/data/esther/Projects/lts_experiments/scripts/OpenNMT-pytorch0.4/onmt/Trainer.py", line 180, in train report_stats) File "train.py", line 95, in report_func report_stats.output(epoch, batch + 1, num_batches, start_time) File "/data/esther/Projects/lts_experiments/scripts/OpenNMT-pytorch0.4/onmt/Trainer.py", line 67, in output self.ppl(), File "/data/esther/Projects/lts_experiments/scripts/OpenNMT-pytorch0.4/onmt/Trainer.py", line 48, in ppl return math.exp(min(self.loss / self.n_words, 100)) RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other'

simlaharma commented 6 years ago

I have the same problem that @dreamk73 posted. Waiting for the update!

vince62s commented 6 years ago

guys, if you want a working repo use our fork for the time being. sorry for the delay.

dreamk73 commented 6 years ago

I downloaded the Ubiqus fork and have verified that I can run the code now using pytorch 0.4.

srush commented 6 years ago

The main branch now supports 0.4, we are working to get most of the other Ubiqus changes in as well.

David-Levinthal commented 6 years ago

Thank you Working actively to incorporate OpenNMT into MLPerf d

On Sat, Jun 23, 2018 at 7:37 PM, srush notifications@github.com wrote:

The main branch now supports 0.4, we are working to get most of the other Ubiqus changes in as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-399724365, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT0deR1BtI-7YqgZh2JBEkRd3TUNaks5t_vtqgaJpZM4Tio05 .

srush commented 6 years ago

Very neat, sorry I have been out of touch. Trying to put some time in the next couple weeks to get all the updates checked and in. In particular the main aim is multi-gpu.

David-Levinthal commented 6 years ago

can 771 get looked at?

this blocks the use of openmt for the pytorch entry for the transformer translation test.

On Sun, Jun 24, 2018 at 8:30 AM, srush notifications@github.com wrote:

Very neat, sorry I have been out of touch. Trying to put some time in the next couple weeks to get all the updates checked and in. In particular the main aim is multi-gpu.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-399765030, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT2vxR2sXn-gMS-pr8wKVoqOZb_UWks5t_7CRgaJpZM4Tio05 .

srush commented 6 years ago

Yes, will prioritize.

https://github.com/OpenNMT/OpenNMT-py/issues/771

David-Levinthal commented 6 years ago

Thank you

On Mon, Jun 25, 2018 at 2:39 PM, srush notifications@github.com wrote:

Yes, will prioritize.

771 https://github.com/OpenNMT/OpenNMT-py/issues/771

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenNMT/OpenNMT-py/issues/705#issuecomment-400104657, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUuT6whSQHY-zK_RIfcLUqW8sgFRyOuks5uAVisgaJpZM4Tio05 .

OpenNMT / OpenNMT-py

Migrate to pytorch 0.4 #705

771 https://github.com/OpenNMT/OpenNMT-py/issues/771