cudadog / pydot

Automatically exported from code.google.com/p/pydot
MIT License
0 stars 0 forks source link

to_string causes OutOfMemory for large graphs #48

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have a graph from the python-graph library that I am rendering to a dot file. 
python-graph uses this library to interact with graphviz. The graph has 
approximately 1,200 nodes and 266,000 edges and runs out of memory on a 4 GB 
machine.

After some profiling, I found that it spent a majority of its time inside of 
pydot and memory usage started increasing dramatically in the to_string method.

Looking at the code, I found that pydot keeps all of the lines of the file in 
memory and then combines it into a single stream. With such a large file, it 
was keeping an absolutely huge list of strings in memory at the same time.

I think a solution to this is to optionally write the results to a file. This 
can be accomplished by having a "to_stream" method added to every class with a 
"to_string" method. Roughly it would be doing this for each "to_string" 
function.

Rename "to_string(self)" to "to_stream(self, graph)". Change all times when it 
does something like:

graph.append('text to write')

to

graph.write('text to write')

From there, you can implement "to_string" by using the builtin StringIO class.

from StringIO import StringIO

def to_string(self):
    try:
        buffer = StringIO()
        self.to_stream(buffer)
        return buffer.getvalue()
    finally:
        buffer.close()

I don't know if the close matters for StringIO. It frees the memory buffer 
(which you've already copied after using getvalue), but that buffer is probably 
freed by the destructor of StringIO.

You keep the old functionality and gain some additional functionality. Anybody 
who wants to use "to_stream" will have to manage whatever stream they want to 
send it to.

I can write this up and submit a patch later, but I won't have time until late 
tonight/tomorrow.

Original issue reported on code.google.com by jonathan...@gmail.com on 12 Apr 2011 at 8:12

GoogleCodeExporter commented 9 years ago
Just another comment, I was wrong about where it was running out of memory. I 
think adding the to_stream would be beneficial, but it ran out of memory when 
constructing the pydot.Dot structure with the incredibly large graph.

Original comment by jonathan...@gmail.com on 13 Apr 2011 at 2:44

GoogleCodeExporter commented 9 years ago
The graph you are trying to generate is fairly large, while I would definitely 
accept any patches that improve the footprint of pydot it's not something I 
could currently spend much time researching. I'll mark it as WontFix.

Original comment by ero.carr...@gmail.com on 20 Apr 2011 at 5:33