NordicPlayground / nRF51-ble-bcast-mesh

Other
324 stars 121 forks source link

bug in version overflow system (lollipop) #28

Closed AlexDM0 closed 9 years ago

AlexDM0 commented 9 years ago

Hi Trond,

When the version becomes higher than the lollipop limit, the mesh keeps repeating the update event.

the limit is set to 200, now lets assume the version is 205 and no NEW message is inputted.

    uint16_t separation = (version >= ch_md->version_number)?
        (version - ch_md->version_number) : 
        (-(ch_md->version_number - MESH_VALUE_LOLLIPOP_LIMIT) + (version - MESH_VALUE_LOLLIPOP_LIMIT) - MESH_VALUE_LOLLIPOP_LIMIT)

This gives the version == ch_md-> version_number and thus seperation = 0;

 if ((ch_md->version_number < MESH_VALUE_LOLLIPOP_LIMIT && version > ch_md->version_number) || 
        (ch_md->version_number >= MESH_VALUE_LOLLIPOP_LIMIT && separation < (UINT16_MAX - MESH_VALUE_LOLLIPOP_LIMIT)/2) || 
        uninitialized)

here the second statement

(ch_md->version_number >= MESH_VALUE_LOLLIPOP_LIMIT && separation < (UINT16_MAX - MESH_VALUE_LOLLIPOP_LIMIT)/2)

is true, thus repeating.

I have searched the code and the MESH_VALUE_LOLLIPOP_LIMIT is not used anywhere else. When the version overflows, this should be higher than the MESH_VALUE_LOLLIPOP_LIMIT right?

To fix, I believe an overflow check should be here https://github.com/NordicSemiconductor/nRF51-ble-bcast-mesh/blob/master/nRF51/rbc_mesh/src/mesh_srv.c#L739 and to add the MESH_VALUE_LOLLIPOP_LIMIT

To solve this issue I propose:

    bool new_version = false;
    uint16_t halfway = (UINT16_MAX - MESH_VALUE_LOLLIPOP_LIMIT)/2;
    // if own ch_md->version_number (own version or:OV) is larger than the version (packet version or: PV), we accept the packet as new if the PV > lollipop and OV is BEFORE the halfway point.
    if (ch_md->version_number > version &&
            version > MESH_VALUE_LOLLIPOP_LIMIT &&
            ch_md->version_number < halfway)
    {
        new_version = true;
    }
    // if own ch_md->version_number (own version or:OV) is smaller than the version (packet version or: PV), we accept the packet as new if the OV < lollipop or if the OV > lollipop and PV is BEFORE the halfway point.
    // if the PV is over the halfway point, the own version larger than lolli is the newest
    else if (ch_md->version_number < version &&
            ((ch_md->version_number < MESH_VALUE_LOLLIPOP_LIMIT) ||
            (ch_md->version_number >= MESH_VALUE_LOLLIPOP_LIMIT && version < halfway)))
    {
        new_version = true;
    }

    if (new_version ||  uninitialized) 
{
//..... continue handling the new packet

You can condense it, but this is probably clearer to read. EDIT: and seperately, to add the lollipop value on overflow.

I'd like to hear your thoughts on this. Did I miss something obvious?

Regards,

Alex

EDIT2: I see I mixed up the meaning of version and ch_md->version_number. I updated my code.

trond-snekvik commented 9 years ago

You're right, but I'm not sure if I follow your solution. The problem only occurs when the local version equals the version of the updated packet, as far as I can see. Wouldn't just flipping the if/else if pair (with their blocks) at line 611 and 639 do the trick? That way, we would check for equality (and decide whether the data is conflicting or the same), and only if the versions are different we'll go and check the big if for the lollipop.

For your solution, I think there's a missed case when the LOLLIPOP_LIMIT is 200, halfway is 400 max is 600 (to keep it simple), the new version is 405, and the old is 390. I don't think the value would get past halfway this way. Please correct me if I'm wrong.

Sorry about the delay by the way. The nRF51 side of the project has been in a slight hiatus as I have been finishing up my thesis, but it will be getting more attention in the following months:)

AlexDM0 commented 9 years ago

Hi Trond,

Yes, you're right! Reviewing it now I think you're right about using seperation, although I don't see why you'd need anything else than:

OV = own version (ch_md->version_number)
PV = packet version (version)

if (OV >= PV) seperation = OV - PV
if (PV < OV) seperation = PV - OV

if ov < pv && (ov < lolli || (ov > lolli && sep > hw)) --> new
if ov > pv && (pv > lolli && sep > hw) --> new
if uninitialized --> new

The seperation will only matter if both are over the lollipop start so the distance is always over the circle of the lollipop. I think: https://github.com/NordicSemiconductor/nRF51-ble-bcast-mesh/blob/master/nRF51/rbc_mesh/src/mesh_srv.c#L607 is wrong at the moment.

example: LOLLIPOP_LIMIT = 200, max = 600 OV = 595, PV = 205 seperation = (-(595-200) + (205-200) - 200) = -395 + 5 - 200 = -590 but is uint so is 0 (which seems nasty) or does it flip the int? which would fail the check below

the if statement then sees OV > lolli && if 0 < halfway and ok it passes and accepts the new value. Hurray!

other example, OV = 301, PV = 300 seperation = (-(301 - 200) + (300 - 200) - 200) = -101 + 100 - 200 = -201 but is 0 due to uint the if statement then sees OV > lolli && if 0 < halfway and ok it passes and accepts the new value. Huh? that shoudnt happen.

Either I'm reading it wrong, but your solution would help for the case if there are no new messages. It would not help the case where the node sends a new message but hears the old one?

Am I overlooking something here?

olskyy commented 9 years ago

Hi there,

I have also noticed an issue with the new version, here are my fixes:

a) separation:

was: uint16_t separation = (version >= ch_md->version_number)? ... mod: uint16_t separation = (version > ch_md->version_number)? ...

b) comparison and decision:

was: if ((ch_md->version_number < MESH_VALUE_LOLLIPOP_LIMIT && version >= ch_md->version_number) ||

mod: if ((ch_md->version_number < MESH_VALUE_LOLLIPOP_LIMIT && version > ch_md->version_number) ||

and it seems to be ok. Cheers Oleh

trond-snekvik commented 9 years ago

Hi guys, I've been looking some more into this issue, and have found and verified a solution. I've gone with the reversal of the if-statements, and also added another guard for an unmissed case where the new value is below the limit and the old is above UINT16_MAX - LIMIT/2. I'll see if I can push the fix today.

@AlexDM0: I think your assumption about -201 in uint being 0 is wrong. Overflow in uint is defined behavior, and -201 is UINT16_MAX - 201 = 65334, so the int flips.

AlexDM0 commented 9 years ago

Hi Trond,

Should you also implement a check if the int has flipped when the limit is reached? So if 65536 is reached it now goes to 0, not to LOLLIPOP_LIMIT.

I think this should be done here: https://github.com/NordicSemiconductor/nRF51-ble-bcast-mesh/blob/master/nRF51/rbc_mesh/src/mesh_srv.c#L739 ?

Glad to hear you're looking into it!

Regards,

Alex

trond-snekvik commented 9 years ago

you're right, added :smile: