CenterForOpenScience / pydocx

An extendable docx file format parser and converter
Other
186 stars 55 forks source link

Encountering a memory leak when call PyDocX.to_html? #260

Open sunglowrise opened 3 years ago

sunglowrise commented 3 years ago

Hi! Encountering a memory leak when call PyDocX.to_html, is not the right way to use it ?

test example:

# python3.6 
# PyDocX == 0.9.10
def test_to_html():
    tracemalloc.start()
    snapshot1 = tracemalloc.take_snapshot()

    for i in range(10):
        with open("/tmp/test.docx", "rb") as f:
            html = PyDocX.to_html(f)
            print(html)

    snapshot2 = tracemalloc.take_snapshot()
    top_stats = snapshot2.compare_to(snapshot1, "lineno")
    for stat in top_stats:
        print(stat)
AlexandreRozier commented 2 years ago

We seem to have the same issue when repeatedly calling .to_hml, memory consumption keeps rising by ~ 30MB increments.

xxxpppfff commented 1 year ago

I seem to have the same issue and it caused the container to crash on restart

henrymcl commented 7 months ago

Hi! Encountering a memory leak when call PyDocX.to_html, is not the right way to use it ?

test example:

# python3.6 
# PyDocX == 0.9.10
def test_to_html():
    tracemalloc.start()
    snapshot1 = tracemalloc.take_snapshot()

    for i in range(10):
        with open("/tmp/test.docx", "rb") as f:
            html = PyDocX.to_html(f)
            print(html)

    snapshot2 = tracemalloc.take_snapshot()
    top_stats = snapshot2.compare_to(snapshot1, "lineno")
    for stat in top_stats:
        print(stat)

I'm supplying the file path to to_html and didn't see memory leaks (please take this with a grain of salt because I don't know how to properly analyze the snapshots).

Also you may want to use tracemalloc.stop().