emanjavacas / pie

A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.
MIT License
22 stars 10 forks source link

CUDA Memory Issue #20

Closed PonteIneptique closed 5 years ago

PonteIneptique commented 5 years ago

Technically unrelated to pie directly but could use a hand.

I put everything that is required here : https://github.com/PonteIneptique/pie-leak

When I run python cli.py corpus lemmatize pie, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen with pie tag which means it definitely is an issue with my code. Can I ask you to have a look at it ?

Traceback (most recent call last):
  File "cli.py", line 173, in <module>
    cli()
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "cli.py", line 103, in lemmatize
    helpers.lemmatizers.run_pie_web(text_files=text_files, target_path="data/curated/corpus/generic/", threads=1)
  File "/home/thibault/dev/these/helpers/lemmatizers/__init__.py", line 22, in run_pie_web
    lemmatizer.output(filepath)
  File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 72, in output
    for token in self.from_file(file_path):
  File "/home/thibault/dev/these/helpers/lemmatizers/base.py", line 29, in from_file
    yield from self.from_string(f.read(), path)
  File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 33, in from_string
    tagged, tasks = self.tagger.tag(sents, lengths)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/tagger.py", line 73, in tag
    preds = model.predict(inp, *tasks, **kwargs)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/model.py", line 342, in predict
    hyps, _ = decoder.predict_max(cemb_outs, clen, context=context)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/decoder.py", line 411, in predict_max
    outs, hidden = self.rnn(emb, hidden)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 7.93 GiB total capacity; 6.10 GiB already allocated; 78.31 MiB free; 872.34 MiB cached)

Order in which it fails :+1: order.txt (starting data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt after the last one )

Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...

emanjavacas commented 5 years ago

It seems the GPU is already full. Could you paste the stacktrace you get when running on CPU? (to see where exactly it fails). My apriori guess would be either model getting copied (especially if there is some multiprocessing going on) or that the batch size/sentence length is too big.

On Fri, Feb 22, 2019 at 10:47 AM Thibault Clérice notifications@github.com wrote:

Technically unrelated to pie directly but could use a hand.

I put everything that is required here : https://github.com/PonteIneptique/pie-leak

When I run python cli.py corpus lemmatize pie, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen with pie tag which means it definitely is an issue with my code. Can I ask you to have a look at it ?

Traceback (most recent call last):

File "cli.py", line 173, in

cli()

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 764, in call

return self.main(*args, **kwargs)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 717, in main

rv = self.invoke(ctx)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke

return ctx.invoke(self.callback, **ctx.params)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke

return callback(*args, **kwargs)

File "cli.py", line 103, in lemmatize

helpers.lemmatizers.run_pie_web(text_files=text_files, target_path="data/curated/corpus/generic/", threads=1)

File "/home/thibault/dev/these/helpers/lemmatizers/init.py", line 22, in run_pie_web

lemmatizer.output(filepath)

File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 72, in output

for token in self.from_file(file_path):

File "/home/thibault/dev/these/helpers/lemmatizers/base.py", line 29, in from_file

yield from self.from_string(f.read(), path)

File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 33, in from_string

tagged, tasks = self.tagger.tag(sents, lengths)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/tagger.py", line 73, in tag

preds = model.predict(inp, *tasks, **kwargs)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/model.py", line 342, in predict

hyps, _ = decoder.predict_max(cemb_outs, clen, context=context)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/decoder.py", line 411, in predict_max

outs, hidden = self.rnn(emb, hidden)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call

result = self.forward(*input, **kwargs)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward

self.dropout, self.training, self.bidirectional, self.batch_first)

RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 7.93 GiB total capacity; 6.10 GiB already allocated; 78.31 MiB free; 872.34 MiB cached)

Order in which it fails 👍 order.txt https://github.com/emanjavacas/pie/files/2893311/order.txt (starting data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt after the last one )

Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6Ho_6r5SS9NWmKrr5oHuOu8B6ZZANOks5vP7zNgaJpZM4bJO02 .

-- Enrique Manjavacas

PonteIneptique commented 5 years ago

The CPU error was the same, albeit regarding cpu. But the thing is, it does not happen with tag.py. Which is weird no?

Le ven. 22 févr. 2019 à 12:54 PM, Enrique Manjavacas < notifications@github.com> a écrit :

It seems the GPU is already full. Could you paste the stacktrace you get when running on CPU? (to see where exactly it fails). My apriori guess would be either model getting copied (especially if there is some multiprocessing going on) or that the batch size/sentence length is too big.

On Fri, Feb 22, 2019 at 10:47 AM Thibault Clérice < notifications@github.com> wrote:

Technically unrelated to pie directly but could use a hand.

I put everything that is required here : https://github.com/PonteIneptique/pie-leak

When I run python cli.py corpus lemmatize pie, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen with pie tag which means it definitely is an issue with my code. Can I ask you to have a look at it ?

Traceback (most recent call last):

File "cli.py", line 173, in

cli()

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 764, in call

return self.main(*args, **kwargs)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 717, in main

rv = self.invoke(ctx)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke

return ctx.invoke(self.callback, **ctx.params)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke

return callback(*args, **kwargs)

File "cli.py", line 103, in lemmatize

helpers.lemmatizers.run_pie_web(text_files=text_files, target_path="data/curated/corpus/generic/", threads=1)

File "/home/thibault/dev/these/helpers/lemmatizers/init.py", line 22, in run_pie_web

lemmatizer.output(filepath)

File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 72, in output

for token in self.from_file(file_path):

File "/home/thibault/dev/these/helpers/lemmatizers/base.py", line 29, in from_file

yield from self.from_string(f.read(), path)

File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 33, in from_string

tagged, tasks = self.tagger.tag(sents, lengths)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/tagger.py", line 73, in tag

preds = model.predict(inp, *tasks, **kwargs)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/model.py", line 342, in predict

hyps, _ = decoder.predict_max(cemb_outs, clen, context=context)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/decoder.py", line 411, in predict_max

outs, hidden = self.rnn(emb, hidden)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call

result = self.forward(*input, **kwargs)

File "/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward

self.dropout, self.training, self.bidirectional, self.batch_first)

RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 7.93 GiB total capacity; 6.10 GiB already allocated; 78.31 MiB free; 872.34 MiB cached)

Order in which it fails 👍 order.txt https://github.com/emanjavacas/pie/files/2893311/order.txt (starting

data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt after the last one )

Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20, or mute the thread < https://github.com/notifications/unsubscribe-auth/AF6Ho_6r5SS9NWmKrr5oHuOu8B6ZZANOks5vP7zNgaJpZM4bJO02

.

-- Enrique Manjavacas

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20#issuecomment-466372214, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1yZjD_H_IS3HFbO7glE1RZk8MO3A5Xks5vP9p5gaJpZM4bJO02 .

emanjavacas commented 5 years ago

Check the batch size and the sentence length of your input.

On Fri, Feb 22, 2019 at 1:09 PM Thibault Clérice notifications@github.com wrote:

The CPU error was the same, albeit regarding cpu. But the thing is, it does not happen with tag.py. Which is weird no?

Le ven. 22 févr. 2019 à 12:54 PM, Enrique Manjavacas < notifications@github.com> a écrit :

It seems the GPU is already full. Could you paste the stacktrace you get when running on CPU? (to see where exactly it fails). My apriori guess would be either model getting copied (especially if there is some multiprocessing going on) or that the batch size/sentence length is too big.

On Fri, Feb 22, 2019 at 10:47 AM Thibault Clérice < notifications@github.com> wrote:

Technically unrelated to pie directly but could use a hand.

I put everything that is required here : https://github.com/PonteIneptique/pie-leak

When I run python cli.py corpus lemmatize pie, I quickly (~100/120 files) run into a Memory Issues. This happened also on CPU (where I have 32GB of RAM). The issue does not seem to happen with pie tag which means it definitely is an issue with my code. Can I ask you to have a look at it ?

Traceback (most recent call last):

File "cli.py", line 173, in

cli()

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 764, in call

return self.main(*args, **kwargs)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 717, in main

rv = self.invoke(ctx)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 1137, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 956, in invoke

return ctx.invoke(self.callback, **ctx.params)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/click/core.py", line 555, in invoke

return callback(*args, **kwargs)

File "cli.py", line 103, in lemmatize

helpers.lemmatizers.run_pie_web(text_files=text_files, target_path="data/curated/corpus/generic/", threads=1)

File "/home/thibault/dev/these/helpers/lemmatizers/init.py", line 22, in run_pie_web

lemmatizer.output(filepath)

File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 72, in output

for token in self.from_file(file_path):

File "/home/thibault/dev/these/helpers/lemmatizers/base.py", line 29, in from_file

yield from self.from_string(f.read(), path)

File "/home/thibault/dev/these/helpers/lemmatizers/pie_impl.py", line 33, in from_string

tagged, tasks = self.tagger.tag(sents, lengths)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/tagger.py", line 73, in tag

preds = model.predict(inp, *tasks, **kwargs)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/model.py", line 342, in predict

hyps, _ = decoder.predict_max(cemb_outs, clen, context=context)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/pie/models/decoder.py", line 411, in predict_max

outs, hidden = self.rnn(emb, hidden)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call

result = self.forward(*input, **kwargs)

File

"/home/thibault/dev/these/these_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward

self.dropout, self.training, self.bidirectional, self.batch_first)

RuntimeError: CUDA out of memory. Tried to allocate 1.07 GiB (GPU 0; 7.93 GiB total capacity; 6.10 GiB already allocated; 78.31 MiB free; 872.34 MiB cached)

Order in which it fails 👍 order.txt https://github.com/emanjavacas/pie/files/2893311/order.txt (starting

data/curated/corpus/generic/urn:cts:latinLit:stoa0171.stoa009.opp-lat1/1.txt

after the last one )

Another note : the issue does not seem to happen if the text are read in another order, which would pinpoint to some garbage collection not being fast enough ? I am a bit lost...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AF6Ho_6r5SS9NWmKrr5oHuOu8B6ZZANOks5vP7zNgaJpZM4bJO02

.

-- Enrique Manjavacas

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20#issuecomment-466372214, or mute the thread < https://github.com/notifications/unsubscribe-auth/AB1yZjD_H_IS3HFbO7glE1RZk8MO3A5Xks5vP9p5gaJpZM4bJO02

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/pie/issues/20#issuecomment-466375852, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6Ho0VU7df30b_Ifc_FT5YMLvilHuEzks5vP93_gaJpZM4bJO02 .

-- Enrique Manjavacas

PonteIneptique commented 5 years ago

Yup, definitely sentence too long. I'll have a better look but thanks for the hint. I'll let the issue open if you don't mind atm.

PonteIneptique commented 5 years ago

T'was indeed a mixture of batch size + sentence size.

PonteIneptique commented 5 years ago

Thanks !