Open wulidongdong opened 3 years ago
Xin Wu,
Yes, the current implementation hard-codes a small vocabulary into the RNN size (the vocab can't be larger than the GNN size). I'm working to fix that and have been testing an embedding layer. I'll try to have something testable by Friday.
Regards, Steve
On Sun, Jan 17, 2021 at 5:40 AM wulidongdong notifications@github.com wrote:
Hi Steve,
I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error.
Traceback (most recent call last): File "/home/cike/.local/bin/onmt_train", line 10, in
sys.exit(main()) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 169, in main train(opt) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 154, in train train_process(opt, device_id=0) File "/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line 107, in main valid_steps=opt.valid_steps) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244, in train report_stats) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368, in _gradient_accumulation with_align=self.with_align) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line 45, in forward enc_state, memory_bank, lengths = self.encoder(src, lengths) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py", line 182, in forward prop_state[i][j][token] = 1 IndexError: index 64 is out of bounds for axis 0 with size 64 But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time.
Xin Wu
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A .
Xin Wu, Yes, the current implementation hard-codes a small vocabulary into the RNN size (the vocab can't be larger than the GNN size). I'm working to fix that and have been testing an embedding layer. I'll try to have something testable by Friday. Regards, Steve … On Sun, Jan 17, 2021 at 5:40 AM wulidongdong @.> wrote: Hi Steve, I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error. Traceback (most recent call last): File "/home/cike/.local/bin/onmt_train", line 10, in
sys.exit(main()) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 169, in main train(opt) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 154, in train train_process(opt, device_id=0) File "/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line 107, in main valid_steps=opt.valid_steps) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244, in train report_stats) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368, in _gradient_accumulation with_align=self.with_align) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward( input, kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line 45, in forward enc_state, memory_bank, lengths = self.encoder(src, lengths) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py", line 182, in forward prop_state[i][j][token] = 1 IndexError: index 64 is out of bounds for axis 0 with size 64 But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time. Xin Wu — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A .
Thank you Steve, that would be great and helpful! I am wondering can I use the old OpenNMT preprocess script to generate vocab files. Which version should I use?
Xin Wu,
I have my embedding code passing tests but I'm working through the
checkers now for a clean pull request. The new pull request will allow for
larger vocabularies and handle the old and new vocab formats, but the vocab
file must include
To learn a bit more about setup, you can look at my example Github file here: https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example/blob/master/src/setupgraph2seq.sh That file includes a perl line that processes a raw vocab file to add the extra tokens.
Regards, Steve
On Mon, Jan 18, 2021 at 12:17 AM wulidongdong notifications@github.com wrote:
Xin Wu, Yes, the current implementation hard-codes a small vocabulary into the RNN size (the vocab can't be larger than the GNN size). I'm working to fix that and have been testing an embedding layer. I'll try to have something testable by Friday. Regards, Steve … <#m-2403714031098670137> On Sun, Jan 17, 2021 at 5:40 AM wulidongdong @.> wrote: Hi Steve, I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error. Traceback (most recent call last): File "/home/cike/.local/bin/onmt_train", line 10, in sys.exit(main()) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 169, in main train(opt) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 154, in train train_process(opt, device_id=0) File "/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line 107, in main valid_steps=opt.valid_steps) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244, in train report_stats) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368, in _gradient_accumulation with_align=self.with_align) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line 45, in forward enc_state, memory_bank, lengths = self.encoder(src, lengths) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py", line 182, in forward prop_state[i][j][token] = 1 IndexError: index 64 is out of bounds for axis 0 with size 64 But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time. Xin Wu — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3 https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example/issues/3>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A .
Thank you Steve, that would be great and helpful! I am wondering can I use the old OpenNMT preprocess script to generate vocab files. Which version should I use?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example/issues/3#issuecomment-762038756, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFBJGZC6A4FJR7ZUEY7FQDS2PN7XANCNFSM4WGCPV6A .
Xin Wu,
I have created a pull request to OpenNMT here: https://github.com/OpenNMT/OpenNMT-py/pull/1998
The changes to ggnn_encoder.py allow for an embedding layer, which allows an arbitrarily large vocab to be used. Also, I updated my example code (which relies on the new GGNN code) here: https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example#graph-input-processing-end-to-end-example . Along with the new ggnn_encoder.py you can now use the current onmt_build_vocab to create a file which can be easily adjusted for GGNN usage. That end-to-end example also has a script that can help format textual trees like (a + (b * c) ) into tree structures used by the GGNN.
Let me know if I can help more. If you can't download from my pull request, I could send you the ggnn_encoder.py directly.
Regards, Steve
On Mon, Jan 18, 2021 at 12:17 AM wulidongdong notifications@github.com wrote:
Xin Wu, Yes, the current implementation hard-codes a small vocabulary into the RNN size (the vocab can't be larger than the GNN size). I'm working to fix that and have been testing an embedding layer. I'll try to have something testable by Friday. Regards, Steve … <#m-1730141524549641451> On Sun, Jan 17, 2021 at 5:40 AM wulidongdong @.> wrote: Hi Steve, I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error. Traceback (most recent call last): File "/home/cike/.local/bin/onmt_train", line 10, in sys.exit(main()) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 169, in main train(opt) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 154, in train train_process(opt, device_id=0) File "/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line 107, in main valid_steps=opt.valid_steps) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244, in train report_stats) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368, in _gradient_accumulation with_align=self.with_align) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line 45, in forward enc_state, memory_bank, lengths = self.encoder(src, lengths) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py", line 182, in forward prop_state[i][j][token] = 1 IndexError: index 64 is out of bounds for axis 0 with size 64 But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time. Xin Wu — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3 https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example/issues/3>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A .
Thank you Steve, that would be great and helpful! I am wondering can I use the old OpenNMT preprocess script to generate vocab files. Which version should I use?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example/issues/3#issuecomment-762038756, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFBJGZC6A4FJR7ZUEY7FQDS2PN7XANCNFSM4WGCPV6A .
The pull request has been accepted so GGNN now supports an embedding layer in the main OpenNMT-py branch.
Hi Steve,
I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error.
But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time.
Xin Wu