THUKEG / saedb

the SAE platform
http://thukeg.github.com/saedb/
11 stars 19 forks source link

Synchronous Engine is not correct. #64

Closed thinxer closed 5 years ago

thinxer commented 11 years ago

Results by the synchronous engine should be the same whether run single-threaded or multithreaded, since it reads node data only from the last iteration. However this is not the case for the page rank test. This indicates that the synchronous engine is not correct. Please investigate this problem.

wweic commented 11 years ago

@thinxer while I can't reproduce the fail again in Mac. It's weird. I run about 50 times manually, all passed. Have you updated with the upstream?

thinxer commented 11 years ago

have you merged the pagerank test with your multithread code to test? On May 17, 2013 6:32 PM, "Wei Chen" notifications@github.com wrote:

@thinxer https://github.com/thinxer while I can't reproduce the fail again in Mac. It's weird. I run about 50 times manually, all passed. Have you updated with the upstream?

— Reply to this email directly or view it on GitHubhttps://github.com/THUKEG/saedb/issues/64#issuecomment-18054335 .

wweic commented 11 years ago

@thinxer Yes. Weird.

wweic commented 11 years ago

@thinxer you try to test on multithraed branch too?

thinxer commented 11 years ago

I just merged your thread_pool branch with upstream/master, and the pagerank_test wouldn't pass.

I reproduced this error today.

wweic commented 11 years ago

Weird.

Could you try my branch directly? just checkout to my thread-pool branch and test.

Wei Chen ipondering.me

On 2013年5月20日Monday at 上午11:05, Jianfei Wang wrote:

I just merged your thread_pool branch with upstream/master, and the pagerank_test wouldn't pass. I reproduced this error today.

— Reply to this email directly or view it on GitHub (https://github.com/THUKEG/saedb/issues/64#issuecomment-18130304).

thinxer commented 11 years ago

Well, you don't have pagerank_test on your branch...

wweic commented 11 years ago

Oops.

My master branch is up-to-date, non-threading and has pr_test.

I run : repeat 100 ./pagerank_test. No error.

Wei Chen ipondering.me

On 2013年5月20日Monday at 上午11:13, Jianfei Wang wrote:

Well, you don't have pagerank_test on your branch...

— Reply to this email directly or view it on GitHub (https://github.com/THUKEG/saedb/issues/64#issuecomment-18130472).

thinxer commented 11 years ago

It's correct because it's single-threaded.

Only with multi-threading can the problem be found, since the execution order of vertex programs is not stable.

On Mon, May 20, 2013 at 11:52 AM, Wei Chen notifications@github.com wrote:

Oops.

My master branch is up-to-date, non-threading and has pr_test.

I run : repeat 100 ./pagerank_test. No error.

Wei Chen ipondering.me

On 2013年5月20日Monday at 上午11:13, Jianfei Wang wrote:

Well, you don't have pagerank_test on your branch...

— Reply to this email directly or view it on GitHub ( https://github.com/THUKEG/saedb/issues/64#issuecomment-18130472).

— Reply to this email directly or view it on GitHubhttps://github.com/THUKEG/saedb/issues/64#issuecomment-18131199 .

wweic commented 11 years ago

Yes, I'll fix that.

I suppose it's because the implementation of thread_pool's join. Since all applys are executed after gathers, before scatter, even change vertex data is isolated from other vertex.

Wei Chen ipondering.me

On 2013年5月20日Monday at 下午12:04, Jianfei Wang wrote:

It's correct because it's single-threaded.

Only with multi-threading can the problem be found, since the execution
order of vertex programs is not stable.

On Mon, May 20, 2013 at 11:52 AM, Wei Chen <notifications@github.com (mailto:notifications@github.com)> wrote:

Oops.

My master branch is up-to-date, non-threading and has pr_test.

I run : repeat 100 ./pagerank_test. No error.

Wei Chen
ipondering.me (http://ipondering.me)

On 2013年5月20日Monday at 上午11:13, Jianfei Wang wrote:

Well, you don't have pagerank_test on your branch...


Reply to this email directly or view it on GitHub (
https://github.com/THUKEG/saedb/issues/64#issuecomment-18130472).


Reply to this email directly or view it on GitHubhttps://github.com/THUKEG/saedb/issues/64#issuecomment-18131199
.

— Reply to this email directly or view it on GitHub (https://github.com/THUKEG/saedb/issues/64#issuecomment-18131411).

wweic commented 11 years ago

this branch is trying to fix it: https://github.com/pondering/saedb/tree/fix-syn-engine .

now found exeGather may have problem. But I suppose it's the problem of OS's memory mapped file.

thinxer commented 11 years ago

It's working on Linux.

On Mon, May 20, 2013 at 5:03 PM, Wei Chen notifications@github.com wrote:

this branch is trying to fix it: https://github.com/pondering/saedb/tree/fix-syn-engine .

now found exeGather may have problem. But I suppose it's the problem of OS's memory mapped file.

— Reply to this email directly or view it on GitHubhttps://github.com/THUKEG/saedb/issues/64#issuecomment-18138136 .

wweic commented 11 years ago

what do you mean by "working"?

I comment out all parallel code except executeInits so I can inspect suspicious part one by one. While I can't find the bug anyway in the executeInits function. when I run ./pagerank_test 5000 times, there is still 1 failure.

do you have any idea about this issue?

Wei Chen ipondering.me

On 2013年5月20日Monday at 下午5:55, Jianfei Wang wrote:

It's working on Linux.

On Mon, May 20, 2013 at 5:03 PM, Wei Chen <notifications@github.com (mailto:notifications@github.com)> wrote:

this branch is trying to fix it:
https://github.com/pondering/saedb/tree/fix-syn-engine .

now found exeGather may have problem. But I suppose it's the problem of
OS's memory mapped file.


Reply to this email directly or view it on GitHubhttps://github.com/THUKEG/saedb/issues/64#issuecomment-18138136
.

— Reply to this email directly or view it on GitHub (https://github.com/THUKEG/saedb/issues/64#issuecomment-18139942).

thinxer commented 11 years ago

You mean that even the single-threaded engine has a hard-to-reproduce bug? (1 in 5000)

wweic commented 11 years ago

I'm not sure.

Single-threaded program should not have the problem, and I don't find a failure at least now.

But when I just make executeInits threading, there will be failure.

Wei Chen ipondering.me

On 2013年5月21日Tuesday at 上午11:59, Jianfei Wang wrote:

You mean that even the single-threaded engine has a hard-to-reproduce bug?

— Reply to this email directly or view it on GitHub (https://github.com/THUKEG/saedb/issues/64#issuecomment-18187688).