Closed knaaptime closed 1 month ago
I can't imagine why precompute would be any different. I don't think that code has been touched in ages. Only thing I can think of to set twoway=False and see how much of a difference that makes?
🤷♂️ that's what i figured, and couldn't see any reason things would be different now. but i can confirm this behavior using the code above in a new conda environment with pandana from the udst channel
unfortunately not seeing any change with twoway=False
I'm seeing the same issue. I can run an aggregate over the same distance, and while it does take a long time (30 mins), it does complete without killing the kernel.
sorry for the circular references, but #104 isn't to blame for this, because i can reproduce using the pre-compiled versions from pip/anaconda
To add on to this, I am also running into this issue with pre-compute.
I did a careful analysis of pandana's memory consumption in the precompute
step. The conclusions so far are that the memory usage is in line with the data structures that we are storing in memory.
I did my tests with a network with 685K nodes (and around 1M edges). Memory consumption (interpreting this graph as directed) is around 7 to 8 GB in the precompute phase.
In the other hand, doing some math on the data structures, we can see that the precompute method basically creates a collection of std::vector
s where the value is another std::vector
specifying the target node (as an unsigned int
) and a float
representing the distance. That’s the dms
member of the Accessibility
class.
In this example, the reachable nodes are around 1179 in average, given that the data structure has 808 millions of elements. Each element is a pair of (uint, float)
that in the tested architecture means 4 + 4 bytes.
In conclusion, the size of the created data structure is in theory 808M * 8 bytes = 6.4 GB.
That is very close to the observed 7 GB, and given the alignment issues and space occupied by the std::vector
s themselves, it’s a very reasonable memory consumption.
So the conclusion is that I’m not seeing any memory explosion in the example that I’m following (that is a very big one). It is just using a reasonable amount of memory taking into account the input size.
I guess from a user perspective this seems to be a new introduction to the library, but thinking on it it might just be non-linear growth in data consumption at large bandwidths (just by the nature of how many nodes are contacted over larger time periods).
Thanks for taking the time to look at this. If I get a chance to experiment with this more I will report back.
agreed, thank you for digging into this
In the past, i've been able to create a pandana network and precompute moderately-sized queries on a laptop (e.g. the linked example precomputes 8000m on an osm network covering the MD-DC-VA MSA).
Using the current version,
net.precompute()
is consuming tons of memory, often eating up everything on the system. For example, with a network slightly larger than Denver county, the following will eat up all the memory on a linux box with 64gb RAM and crash the process. The same also happens on my macbook.On the same pdna.network, calling
precompute(5000)
consumes 40gb of ramIf I don't precompute, I'm able to perform the accessibility queries with hardly any resource consumption (albeit much more slowly, of course).
Any idea what could be happening?
Environment
json 2.0.9 numpy 1.16.2 pandana 0.4.1 osmnet 0.1.5 pandas 0.24.2 compiler : GCC 7.3.0 system : Linux release : 4.18.0-16-generic machine : x86_64 processor : x86_64 CPU cores : 12 interpreter: 64bit