BayesianLogic / blog

The BLOG programming language
http://bayesianlogic.github.io/
BSD 4-Clause "Original" or "Old" License
98 stars 31 forks source link

logarithmic time performance #337

Open davidenitti opened 9 years ago

davidenitti commented 9 years ago

Dear, I am comparing Blog with my language, and I noticed a logarithmic behavior of the time performance of BLOG. For example, if I use 1000 samples it takes 0.6 s, if I use 10000 samples it takes 1.5 s (instead of 0.6x10=6), if I use 100000 samples it takes 5.7 s (instead of 1.5x10=15). (excluding parsing time.) I would aspect that the time is linear with respect to the number of samples. do you use parallelization? If so, is there a way to avoid this and use only 1 thread? is there another reason for this behavior?

jxwuyi commented 9 years ago

Hi,

What model are you using?

davidenitti notifications@github.com于2015年6月18日星期四写道:

Dear, I am comparing Blog with my language, and I noticed a logarithmic behavior of the time performance of BLOG. For example, if I use 1000 samples it takes 0.6 s, if I use 10000 samples it takes 1.5 s (instead of 0.6x10=6), if I use 100000 samples it takes 5.7 s (instead of 1.5x10=15). (excluding parsing time.) I would aspect that the time is linear with respect to the number of samples. It also seems that there is no parallelization (initialization excluded). Am I right? can you give an explanation of this?

— Reply to this email directly or view it on GitHub https://github.com/BayesianLogic/blog/issues/337.

davidenitti commented 9 years ago

id-uncert-det.blog id-uncert-det blog The time given the number of samples is logarithmic below 20000 samples then becomes linear. this is really strange. this happens also for other models, e.g., burglary.blog I tried to force BLOG to use of a single CPU, it gets a bit slower, but the logarithmic behavior remains.

cberzan commented 9 years ago

You're right that the runtime should be linear in the number of samples. I suspect the non-linear runtime you're seeing with small number of samples is because of data structure / JIT overhead. When you run with more samples, this overhead becomes less pronounced, and you see linear runtime.

We don't do explicit parallelism / multithreading in either BLOG or DBLOG. You may see multiple cores being used, but that's because of parallelism in the JVM, not in our code.