facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.75k stars 560 forks source link

[m1 Pro] I get a warning about memory leaks and not sure how to procced. #162

Open SeniorMars opened 10 months ago

SeniorMars commented 10 months ago

First here is my machine:

Model Name: MacBook Pro
  Model Identifier: MacBookPro18,3
  Model Number: Z15G001WDLL/A
  Chip: Apple M1 Pro
  Total Number of Cores:    10 (8 performance and 2 efficiency)
  Memory:   16 GB
  System Firmware Version:  10151.1.1
  OS Loader Version:    10151.1.1

I tested nougat with a sample pdf I found at https://facebookresearch.github.io/nougat/ and the recommend command.

  λ in ~/Doc/g/no_test took 21s ❯❯ nougat 311780065_536165785007570_1635422204823795538_n.pdf -o out
/Users/charlie/.asdf/installs/python/3.10.10/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3527.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|                                                                                                                     | 0/7 [00:00<?, ?it/s]-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!
-> Cannot close object, library is destroyed. This may cause a memory leak!

I am not sure how to proceed, and if this is an error or not?

So far I don't get any outputs, and I thought it may be because my computer is slow (i don't think that's the case), but i let it run for a while and nothing.

aiainui commented 10 months ago

me too.

LoganWalls commented 10 months ago

I'm also running into this issue (and I'm seeing monster RAM usage to go along with it).

The message seems to originate from pypdfium, so this is probably related to rasterize_paper(), but I haven't figured out much more than that.

luiz0992 commented 10 months ago

Same issue here. Not getting any output after the warning

sidharthrajaram commented 9 months ago

I get this warning as well, except it gets printed after producing output. However, the API endpoint serving the model crashes immediately after the warning.

mara004 commented 9 months ago

See https://github.com/facebookresearch/nougat/issues/110#issuecomment-1766218261

nougat currently uses a deprecated method of pypdfium2 which regrettably was a design mistake of mine. It should use single-page rendering page.render() with a linear loop or native parallelization, not pdf.render(). I believe changing this would get rid of these problems. There's a PR already; it would just need to be merged: https://github.com/facebookresearch/nougat/pull/173

Another idea would be to proceed with the deprecation on the pypdfium2 side and make pdf.render() do linear rendering.

Mohamed-E-Fayed commented 9 months ago

Hi,

After some investigation and making the code render pages as recommended, I noticed that running the model using MPS is much slower than using CPU. Additionally, it consumes a significant amount of RAM, that may exceed 100GB, while CPU version sticks around 10GB for the paper of Nougat itself.

Note: it is slow even at the first batch (with batch size=1) before swapping.

I'm running it on M2Max (64GB), Mac OS 13.6.2, python 3.11.5, transformers 4.35.2.

ehartford commented 6 months ago

me too on m3 max 128gb and it keeps running out of memory and crashing

ehartford commented 6 months ago

Hi,

After some investigation and making the code render pages as recommended, I noticed that running the model using MPS is much slower than using CPU. Additionally, it consumes a significant amount of RAM, that may exceed 100GB, while CPU version sticks around 10GB for the paper of Nougat itself.

Note: it is slow even at the first batch (with batch size=1) before swapping.

I'm running it on M2Max (64GB), Mac OS 13.6.2, python 3.11.5, transformers 4.35.2.

can you please tell me how to use it with CPU?

mara004 commented 5 months ago

the issue haven't been solved

The original issue (memory leak warnings) should already be resolved if you are using pypdfium2 >= 4.25.0 (see changelog for details). However, it's possible something else in nougat may be consuming, or leaking, too much memory.

Mohamed-E-Fayed commented 5 months ago

That was one reason. Another reason for it is the usage of ‘ops’ as a device instead of ‘cpu’. It may be something to do with HuggingFace transformers or PyTorch. It may be due to some kernel operations are not implemented or not compatible.

However, the issue of consuming more than 100GB when using ‘ops’, while sticking around 10GB when using ‘cpu’ still not resolved.

Thank you.

On 31 Mar 2024, at 2:33 PM, mara004 @.***> wrote:

the issue haven't been solved

The original issue (memory leak warnings) should already be resolved if you are using pypdfium2 >= 4.25.0 (see changelog for details). However, it's possible something else in nougat may be consuming, or leaking, too much memory.

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/nougat/issues/162#issuecomment-2028699548, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANJ66AIR7IOVYB45SUHDFEDY277CTAVCNFSM6AAAAAA6P3IH5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYGY4TSNJUHA. You are receiving this because you commented.

mara004 commented 5 months ago

I guess you should maybe file a separate issue about that, because the original issue here was mainly about the warnings from pypdfium2.