Dynamo becoming unusable, execution time grows exponentially

frankfralick commented 10 years ago

Upon starting Revit and Dynamo for the first time, the definition executes in a reasonable amount of time, but only the first two times it is executed. The second execution is slightly faster, as I would expect. The third execution is 60 times slower. 60.

Here are screenshots of two separate sessions of Revit. Each run is apples to apples, the the same input is changed each time, and in a way that produces roughly the same number of objects:

dynamorevit 2014-07-11 time 02

Also, during the third execution, the physical memory usage goes way up. Total memory usage was at 10gb up until hitting run the third time, and then goes to 11.7gb: dynamorevit 2014-07-11 memory

I will try to make an example that reproduces this, but it will take time. If you are open to seeing the actual project file and definition I'm using that is ok with me, it is just a large project file. This seems like a memory leak.

dynamorevit 2014-07-11 memory 03

I don't think I will be able to run a memory allocation profile successfully in this situation. I thought that maybe it was something to do with IronPython, and it may be. The only random, feeble dart I threw at it was to create a runtime for the script engine that can be shutdown after execution, just to try to make sure things are getting let go of. This didn't do anything.

The only python node that is getting executed on subsequent runs after the first one works with points, polycurves, and creates surfaces. The return is an array of surfaces.

Any help will be greatly appreciated.

ke-yu commented 10 years ago

@frankfralick , could you reproduce it in standalone Dynamo?

lukechurch commented 10 years ago

Something odd is going on here. It looks like it might be a problem in the geometry library, but it could be that the engine isn't sending it call arguments correctly. We've been tracking some bugs around object management in Python recently.

If you build from source in debug - in stand alone mode - you'll find a command in the Debug menu called "Force Rexecute" you could try running that, and seeing if the next run behaves itself. If you do do this experiment, please do let us know what the result is.

Could you send us the definition? Feel free to mail it to me directly rather than post it in public.

ikeough commented 10 years ago

Frank, can you also disable background preview and repeat the steps above. I'd like to try to isolate the problem...see if the leak is in the vm.

frankfralick commented 10 years ago

@lukechurch @ikeough @ke-yu I have some more information to provide on this. I have not had time to try to reproduce this in the stand alone version, but I plan to because this is causing pain. @ikeough turning off 3d background preview has no effect.

I fetched your code again Tuesday for reference. This is what I'm observing, with more detail and conjecture:

As previously mentioned, the the first two runs are ok around 8 seconds now, and the third is in the 10 minute range. A fourth execution, the one time I tried, took a little longer than the third.

I took the time to set markers that could show up in concurrency visualizer and this provided a lot more information (or red herrings, we'll see).

This is the first two runs. These images may be hard to see unless you zoom in: dynamorevit 2014-07-17 first two

The purple segments are I/O, green segments are execution. Area 1 and 2 are the first two Python nodes in the definition. The purple is the loading of assemblies/adding references followed by execution. They are small scripts and don't do a lot. Area 3 in the image is the third, larger script loading its assemblies. This script imports a lot more things than the other two, this makes sense. Area 4 is the non I/O execution of this bigger Python node. If you look at the call stacks for the green execution segment, you see a lot of ASM and LibG stuff. This makes sense. Area 5 is UI work done after execution finished up until I pressed the Run button again. You can see after hitting run the second time (see the RunAsync() marker), because I only made a change to something going into the third Python node, it immediately loads assemblies and references for this script, and executes like the first run, and slightly faster. This all makes sense.

This is the third run: dynamorevit 2014-07-17 third

If we zoom in this is what we have:

dynamorevit 2014-07-17 third zoom

I can't say for sure but it certainly appears that the script execution is being stopped and restarted every 2 seconds. Mixed in there is the garbage collection of worker threads on the same interval. If we zoom way way out, thread party:

dynamorevit 2014-07-16 big

Pink segments are synchronization. Concurrency visualizer says that 83% of time was spent on synchronization and 2% on execution. The cascade of new worker threads seems to line up with the beginning of the 2 second re-execution cycle (i say re-execution, but I have no idea). It appears that there is some sort of race that prevents this script from making forward progress.

The good news is that the portion of the script that does get to run before being restarted is allocating memory that doesn't get collected from earlier generation collections. Eventually there are too many long lived objects and the garbage collector stops everything and does a full collection. This somehow unconstipates the whole situation, and execution is able to complete. I'm not sure that this is how it is able to complete, but the gen2 collection happens just prior to the script completing. As the fourth run takes just as long, I don't think that the full garbage collection releases python objects (hunch), but is somehow from a timing standpoint allowing things to continue since it suspends execution on all the threads.

I believe that the python node leaks, and if so is probably just an issue that is in concert with another one in this particular case. Who knows.

Tomorrow I will briefly try to reproduce this in the stand alone version, but I have a feeling I will have a hard time doing so. If I can't easily I'm going to send the whole Revit project and definition. If you have any other ideas in the meantime let me know.

frankfralick commented 10 years ago

@ke-yu @lukechurch @ikeough I attempted to reproduce this in Dynamo standalone by writing points that were originally obtained through element selection to a text file. So the geometry being created was apples to apples. The definition runs really fast in standalone. The first time it was 5 seconds, after that each run was about 1 second. I will email you a link to the files.

ikeough commented 10 years ago

The first run is slow because we are starting up the geometry library. This is annoying and we should fix it, but not deadly. So, if this works in stand-alone, it sounds like Revit is the pain.

ikeough commented 10 years ago

@lukechurch Tracking internally as MAGN-4079.

DynamoDS / Dynamo

Dynamo becoming unusable, execution time grows exponentially #1938