dash-project / dash-apps

Example applications and tools for DASH
12 stars 3 forks source link

dash::Shared not updated as expected #5

Closed BenjaProg closed 7 years ago

BenjaProg commented 7 years ago

dash::Shared is not updated as expected in the current version of thresh 3b4a4642ded51b01fe0491611ef78783a7b98882 . The problem seems to be here line110. In this line all units set the variable threshold to value max (as default). Then in line130 only unit 0 sets threshold to the calculated value. Every unit gets it's own local copy of the variable in line140.

And there the problem becomes visible: No matter how much i used the lines135-137(barriers and flush) the value of the really found threshold and the value of the dash::Shared variable threshold have a high chance to differ.

Sometimes threshold is then set to max and sometimes to the calculated value as desired. see cout line 131.

If I move the first threshold.set in line110 so that only unit 0 will execute it, the problem hasn't occured on my VM. But i can not really falsify or verify it.

remark: in my case line110 can be deleted entirely because no default is needed (the variable always gets a value). But do i have a guarantee that all units will get the right value in line140 if only unit 0 sets the variable?

I attached a txt where you can see which parameters I used to test and which input i gave the program. Also my ouput is in there. thresh call_input_output.txt

Actions needed: I think it's either a missing documentation of how to use dash::Shared or a bug.

fmoessbauer commented 7 years ago

That cannot work, as multiple units set this value. This is a prime example for a race condition. That's not a problem of dash::shared.

To fix this, simply let just one unit do the call:

if(0 == myid){
 threshold.set(max);
}
BenjaProg commented 7 years ago

Then I have a guarantee that it will work, do I?

Would documentate this in doxygen as well for the users.

fmoessbauer commented 7 years ago

dash::shared is simply a shared variable which can be written and read by any unit. Synchronizing concurrent accesses is in the responsibility of the user.

Hence, if you want to access the value set by unit 0, you still need a barrier:

if(0 == myid){
 threshold.set(max);
}
threshold.barrier()
auto localvalue = threshold.get()
BenjaProg commented 7 years ago

Responsibility of the user: yes. But if someone is just as blind as me and can't see the classic race condition, shouldn't he at least get a hint if he takes a glimpse at the documentation? A simple example of how to use (like yours in the post before) or a remark for possible race conditions would suffice, don't you think so?

fmoessbauer commented 7 years ago

Yes, of course. Documentation is always useful. Just add it including your example. If we can help new users, we will always try to do that.

BenjaProg commented 7 years ago

Okay thank you very much for your help!!