dsorokin / aivika

A multi-method simulation library
Other
79 stars 4 forks source link

Performance question #1

Closed drautzburg closed 9 years ago

drautzburg commented 10 years ago

I hope it is okay to ask a question here, but I could not find an Aivika forum. Feel free to direct me elsewhere.

I ran the TimeOutInt example (with spcStopTime = 100000.0) and returned the number of messages along with the actual result and measured the time. I got 254254 messages in 1.350 seconds. This is apx 5.3 sec per million msgs.

Some time ago I had started to write a DES in Smalltalk. There I simulated an infinitely fast Item "source" connected to a "throttle" which is connected to an infinitley fast "sink". I counted the events and took the timeToRun and got 500,000 Events in 0.468 seconds. This is apx. 0.92 sec per million events.

This came as a surprise, because I was expecting the compiled haskell version to be much faster than the Smalltalk version, but the Smalltalk version appears to be faster by a factor of 5.

What do you think? Am I comparing apples with oranges (i.e. messages with events). One thing I noticed is that Aivika seems to speed up the larger I set spcStopTime. With 1000000.0 I get 3.8 sec per million msgs. This may be attributed to the time to startup the program. Can you think of anything else?

dsorokin commented 10 years ago

Hi Martin,

Yes, it is absolutely ok to ask such questions here. Also I would be glad to reply if you emailed me directly at david.sorokin@gmail.com.

The startup time can be indeed significant as your first test shows. As far as I understand, you launched another test in the already running Smalltalk environment such as Pharo or VisualWorks. In case of Haskell the operating system has to do some work before actual running the simulation, which is noticeable if the time is limited within 1 or 2 seconds. So, the real test must be long enough to exclude that time. By the way, I got very similar 3,58 secs per million msgs on my MacBook.

Regarding the theory, this model uses a heavy-weight Process monad. The model is actually based on the same model from the SimPy documentation. SimPy is a process-oriented simulation library. Following this, my example also illustrates how we can cancel and interrupt the process. It has its cost without any doubt.

Actually, the Process monad was a slightly faster (> 5-10%) before I added the cancellation and an ability to handle the IO exceptions within processes. Here I decided to prefer an easiness of modeling at the cost of relatively small decreasing of the speed of simulation.

How to improve the speed? We can rewrite it using the Event monad, i.e. based on the event-oriented paradigm. This monad is significantly faster than the Process monad.

Moreover, I suppose that such a rewritten model could be more close to your model that you wrote in Smalltalk earlier.

By the way, I like Smalltalk. I could forgot many things from this language but I could probably read the model if you provided it.

Thanks, David

P.S. I can reply mostly in the evening and on week-end as in the rest time I program in C++ or just do other pleasing things that have no anything common with programming at all :)

drautzburg commented 10 years ago

Yes, the test was run in an already running pharo environment. I don't know a good way to extract just the relevant pieces of my code from the image, so I better just explain it or you'd have to wade through lots of junk.

The whole simulation is about moving items through a network of processes. I have Processes, which have in- and out-Ports. Connections are made by connecting an out-Port with an in-Port 1:1. Ports hold state which is basically "holds (Item)" or "doesn't hold item" and a port can "see" the state of its connected Port (both ways). Whenever a Port changes State, the Process on the other end of the connection gets notified. Hence a single event "check your ports" is sufficient.

The Processes themselves can cold as much state as they please. Particularly the "throttle" process remembers the last time it let an item pass through.

I suppose this is a lot more specialized than what Aivika is doing.

BTW, for my purposes a performance of 10 seconds per million Events would be sufficient, so Aivika is certainly fast enough. But my primary goal is to understand this kind of functional simulation. I have a feeling that I need it in javascript. Too bad I am constantly hit by my lack of haskell skills. Even the Cont monad gives me headaches. Sigh. I'll read you blog post next.

dsorokin commented 10 years ago

Martin,

Your task seems indeed to be more specialized.

I think that such a model could be also defined with help of the Circuit arrow. It implements an idea of the finite automaton, inspired by the Yampa library, one Arrow tutorial (mentioned in the Circuit module documentation) and some articles about the Arrows. The Circuit computation is based on the Event monad.

Regarding the processes, I will think about that how I could optimize them.

Now I close the issue. Please write me a letter if you want to proceed with the discussion. I will be glad to receive your letter.

Thanks, David

dsorokin commented 10 years ago

I looked at the Process monad implementation. I thought about it in the past and now I suppose that the most slow part is the very creation of processes, namely function newProcessId. I didn’t measure it and did not yet profile this example, but it is probably so.

If it is true, then we can apply an idea of pools for this task (with some handling of restoring the canceled processes to a life again, which is absolutely possible). Only we will have a pool of the Process computations associated with some unique ProcessId identifiers. This pool could be increased infinitely by demand, as such discontinuous processes are rather cheap.

Thank you again for pointing me to this potentially weak place. I will think what can I do to improve the performance here.

dsorokin commented 10 years ago

Actually, we need a pool of the ProcessId identifiers then. It needs some thinking, testing and profiling.

dsorokin commented 10 years ago

My tests and the profiling show that the slowest part in the mentioned model is a generation of random numbers with the following lifting into the Process computation. The simulation specs from this example use a wrapper to the standard generator newStdGen from module System.Random.

After I replaced the random number (ackTime) with its constant mean value (= 1 / ackRate), which can be done for the exponential distribution, although the estimation is still rather rough, then the execution time was decreased in 4.5 times (spcStopTime = 3000000.0), but as I wrote this is a very rough approximation and the simulation result I received is different: 0.9% instead of desired 0.6%.

A more higher percentage literally means that the processes were more often interrupted and were rarely canceled. The interruption seems to be cheaper than the cancellation of the process.

So, the approximation applied is not equal but it provides with some information for the further investigation.

dsorokin commented 9 years ago

Now a custom random number generator can be specified with help of the CustomGenerator01 data constructor. Therefore, I close the issue. The generator seems to be the slowest part in this test.

By the way, I'm working on version 2.0 of my Haskell library. It introduces monad transformers. This is branch aivika-2. Many things already work. The performance in case of IO is very close.