bvn-architecture / RevitBatchProcessor

Fully automated batch processing of Revit files with your own Python or Dynamo task scripts!
GNU General Public License v3.0
289 stars 76 forks source link

Feature request: parallel execution #36

Open infeeeee opened 5 years ago

infeeeee commented 5 years ago

Thank you for this project, this is the best for modifying multiple Revit files!

Revit mostly uses only one cpu core for the startup and dynamo, so (I think) it would really speed up the execution if the program could start a lot of Revit sessions concurrently, and process one file in one session, the next file in a new revit session, but don't wait for the previous session to finish.

DanRumery commented 5 years ago

Hi @infeeeee

I agree a parallelized version of RBP would be really useful! I have considered it before but it would require some significant re-designing to how it currently works in terms of tracking and reporting progress and managing the queue of files to process.

I'll let you know if I get anywhere with this.

One workaround (with some programming required) would be to have a script that reads a Revit file list, and feeds each file (in its own file list) to separate instances of RBP running simultaneously.

I would have concerns about resource usage when running in parallel, for CPU but especially for memory usage and potentially disk space too (if many large local files are created at the same time, etc).

infeeeee commented 5 years ago

Of course it should be optional, I'm sure it won't speed up in some use-cases.

I run a little test concerning this. I run a small dynamo script on 60 files, it just 10 nodes, renaming loaded families. I split the file list to 10 file lists each containing 6 revit files. The PC has a 16 thread AMD cpu with 32GB rams and nvme ssd

Next time I will split to multiple file lists, I didn't think about this before, I can live with this workaround until you don't implement this, thank you for this nice software again!

DanRumery commented 5 years ago

@infeeeee

Nice experiment!

One strategy for list-splitting I've used in the past is based on file size (because file size is often somewhat correlated with Revit processing time).

One algorithm is to sort the file paths from largest file size to smallest in a queue, and one by one pick a file path from the head of the queue and assign it to a file list, summing the total file size assigned to each list. At each step, you choose the file list to assign a file path to based on which file list has the least file size assigned to it at the time. It continues until the queue is empty (all file paths have been assigned to a file list). There's almost certainly a name for this binning algorithm.. (Google 'bin packing problem')

DanRumery commented 5 years ago

I've also thought about an idea to have a "server" mode for RBP, where it continuously monitors a queue of file paths to process, perhaps different task scripts for each file, so you could feed it files on demand and it would process them as they become available. That combined with a parallelized version would be pretty powerful!

RyanSchw commented 5 years ago

~Is all of the .exe program based in Python?~ I think I know of a way to do this in C# with WCF (namely NamedPipeServerStream with each thread being NamedPipeClientStream), easily done using .NET framework.

Edit: While it's possible to connect as a WCF client in Python using a Python SOAP client like Zeep, I think we must use the C# framework to host the WCF "server". That being said if we use multiprocessing, I think we can use one of these

I'm not sure if this is a CPU-bound or I/O-bound problem (my guess is CPU-bound), as I can max out my CPU opening 6 instances of Revit. That being said, I could see how one could argue an I/O-bound approach, as a large chunk of time in the batch processes is opening the Revit files to be used (I could also max my CPU opening 4 files concurrently, so I'm not sure how to label that).

Depending on where the root problem is (and where multiple Revit instances can be used efficiently), that would change the approach for multiprocessing vs threading.

Edit2: I see that anonymous pipes are already being used, perhaps it's possible to use that as a starting point for WCF?