Closed PonteIneptique closed 5 years ago
It seems the GPU is already full. Could you paste the stacktrace you get when running on CPU? (to see where exactly it fails). My apriori guess would be either model getting copied (especially if there is some multiprocessing going on) or that the batch size/sentence length is too big.
On Fri, Feb 22, 2019 at 10:47 AM Thibault Clérice notifications@github.com wrote:
Technically unrelated to pie directly but could use a hand.
I put everything that is required here : https://github.com/PonteIneptique/pie-leak
When I run python cli.py corpus lemmatize pie, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen with pie tag which means it definitely is an issue with my code. Can I ask you to have a look at it ?
Traceback (most recent call last):
File "cli.py", line 173, in
cli()
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "cli.py", line 103, in lemmatize
helpers.lemmatizers.run_pie_web(text_files=text_files, target_path="data/curated/corpus/generic/", threads=1)
File "/home/thibault/dev/these/helpers/lemmatizers/init.py", line 22, in run_pie_web
lemmatizer.output(filepath)
File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 72, in output
for token in self.from_file(file_path):
File "/home/thibault/dev/these/helpers/lemmatizers/base.py", line 29, in from_file
yield from self.from_string(f.read(), path)
File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 33, in from_string
tagged, tasks = self.tagger.tag(sents, lengths)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/tagger.py", line 73, in tag
preds = model.predict(inp, *tasks, **kwargs)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/model.py", line 342, in predict
hyps, _ = decoder.predict_max(cemb_outs, clen, context=context)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/decoder.py", line 411, in predict_max
outs, hidden = self.rnn(emb, hidden)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 7.93 GiB total capacity; 6.10 GiB already allocated; 78.31 MiB free; 872.34 MiB cached)
Order in which it fails 👍 order.txt https://github.com/emanjavacas/pie/files/2893311/order.txt (starting data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt after the last one )
Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6Ho_6r5SS9NWmKrr5oHuOu8B6ZZANOks5vP7zNgaJpZM4bJO02 .
-- Enrique Manjavacas
The CPU error was the same, albeit regarding cpu. But the thing is, it does not happen with tag.py. Which is weird no?
Le ven. 22 févr. 2019 à 12:54 PM, Enrique Manjavacas < notifications@github.com> a écrit :
It seems the GPU is already full. Could you paste the stacktrace you get when running on CPU? (to see where exactly it fails). My apriori guess would be either model getting copied (especially if there is some multiprocessing going on) or that the batch size/sentence length is too big.
On Fri, Feb 22, 2019 at 10:47 AM Thibault Clérice < notifications@github.com> wrote:
Technically unrelated to pie directly but could use a hand.
I put everything that is required here : https://github.com/PonteIneptique/pie-leak
When I run python cli.py corpus lemmatize pie, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen with pie tag which means it definitely is an issue with my code. Can I ask you to have a look at it ?
Traceback (most recent call last):
File "cli.py", line 173, in
cli()
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "cli.py", line 103, in lemmatize
helpers.lemmatizers.run_pie_web(text_files=text_files, target_path="data/curated/corpus/generic/", threads=1)
File "/home/thibault/dev/these/helpers/lemmatizers/init.py", line 22, in run_pie_web
lemmatizer.output(filepath)
File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 72, in output
for token in self.from_file(file_path):
File "/home/thibault/dev/these/helpers/lemmatizers/base.py", line 29, in from_file
yield from self.from_string(f.read(), path)
File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 33, in from_string
tagged, tasks = self.tagger.tag(sents, lengths)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/tagger.py", line 73, in tag
preds = model.predict(inp, *tasks, **kwargs)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/model.py", line 342, in predict
hyps, _ = decoder.predict_max(cemb_outs, clen, context=context)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/decoder.py", line 411, in predict_max
outs, hidden = self.rnn(emb, hidden)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 7.93 GiB total capacity; 6.10 GiB already allocated; 78.31 MiB free; 872.34 MiB cached)
Order in which it fails 👍 order.txt https://github.com/emanjavacas/pie/files/2893311/order.txt (starting
data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt after the last one )
Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20, or mute the thread < https://github.com/notifications/unsubscribe-auth/AF6Ho_6r5SS9NWmKrr5oHuOu8B6ZZANOks5vP7zNgaJpZM4bJO02
.
-- Enrique Manjavacas
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20#issuecomment-466372214, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1yZjD_H_IS3HFbO7glE1RZk8MO3A5Xks5vP9p5gaJpZM4bJO02 .
Check the batch size and the sentence length of your input.
On Fri, Feb 22, 2019 at 1:09 PM Thibault Clérice notifications@github.com wrote:
The CPU error was the same, albeit regarding cpu. But the thing is, it does not happen with tag.py. Which is weird no?
Le ven. 22 févr. 2019 à 12:54 PM, Enrique Manjavacas < notifications@github.com> a écrit :
It seems the GPU is already full. Could you paste the stacktrace you get when running on CPU? (to see where exactly it fails). My apriori guess would be either model getting copied (especially if there is some multiprocessing going on) or that the batch size/sentence length is too big.
On Fri, Feb 22, 2019 at 10:47 AM Thibault Clérice < notifications@github.com> wrote:
Technically unrelated to pie directly but could use a hand.
I put everything that is required here : https://github.com/PonteIneptique/pie-leak
When I run python cli.py corpus lemmatize pie, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen with pie tag which means it definitely is an issue with my code. Can I ask you to have a look at it ?
Traceback (most recent call last):
File "cli.py", line 173, in
cli()
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "cli.py", line 103, in lemmatize
helpers.lemmatizers.run_pie_web(text_files=text_files, target_path="data/curated/corpus/generic/", threads=1)
File "/home/thibault/dev/these/helpers/lemmatizers/init.py", line 22, in run_pie_web
lemmatizer.output(filepath)
File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 72, in output
for token in self.from_file(file_path):
File "/home/thibault/dev/these/helpers/lemmatizers/base.py", line 29, in from_file
yield from self.from_string(f.read(), path)
File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 33, in from_string
tagged, tasks = self.tagger.tag(sents, lengths)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/tagger.py", line 73, in tag
preds = model.predict(inp, *tasks, **kwargs)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/model.py", line 342, in predict
hyps, _ = decoder.predict_max(cemb_outs, clen, context=context)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/decoder.py", line 411, in predict_max
outs, hidden = self.rnn(emb, hidden)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File
"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 7.93 GiB total capacity; 6.10 GiB already allocated; 78.31 MiB free; 872.34 MiB cached)
Order in which it fails 👍 order.txt https://github.com/emanjavacas/pie/files/2893311/order.txt (starting
data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt
after the last one )
Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20, or mute the thread <
.
-- Enrique Manjavacas
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20#issuecomment-466372214, or mute the thread < https://github.com/notifications/unsubscribe-auth/AB1yZjD_H_IS3HFbO7glE1RZk8MO3A5Xks5vP9p5gaJpZM4bJO02
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20#issuecomment-466375852, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6Ho0VU7df30b_Ifc_FT5YMLvilHuEzks5vP93_gaJpZM4bJO02 .
-- Enrique Manjavacas
Yup, definitely sentence too long. I'll have a better look but thanks for the hint. I'll let the issue open if you don't mind atm.
T'was indeed a mixture of batch size + sentence size.
Thanks !
Technically unrelated to pie directly but could use a hand.
I put everything that is required here : https://github.com/PonteIneptique/pie-leak
When I run
python cli.py corpus lemmatize pie
, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen withpie tag
which means it definitely is an issue with my code. Can I ask you to have a look at it ?Order in which it fails :+1: order.txt (starting data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt after the last one )
Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...