Brewskey / spark-protocol

Node.JS module for hosting direct encrypted CoAP socket connections
GNU Lesser General Public License v3.0
9 stars 16 forks source link

Fixes OTA Updates #130

Closed haeferer closed 7 years ago

haeferer commented 7 years ago

First, sorry for the changed files (i dont know why my prettify changes other elements, i simply run "npm run prettify"). Please ignore this PR if this is a problem! Anyway, Flasher.js contains the bugfix.

jlkalberer commented 7 years ago

So you're just waiting for a maximum of 10 seconds?

haeferer commented 7 years ago

Original code waits 3 second (should) my bad wifi tooks about 5 Sec to answer. 10 sec looks good to me

haeferer commented 7 years ago

I've look in the logs, the problems seems more complex. I takes only 500ms to push the messages on the network (550 messages!) so this means the wait time also includes the transmission time ! Maybe we should slow down sending (waiting every 10 messages for 10ms to slow down send process (also for large number of particles)

jlkalberer commented 7 years ago

Yeah, your fix doesn't actually address the issue. I think there is a bug in the CRC or tokens + COAP. I'll spend more time tonight investigating.

haeferer commented 7 years ago

I will try to implemented such an bandwidth restriction.

CRC-Issue: is there a way to decide if the bug is based on CRC or Missing Package (wrong ordered) on photon side. (Missing Package Chunk was 501 on my side).

CRC-Issue: if the CRC is wrong for this specific chunk, why does it work on reSend? Maybe its more a buffering Issue on particle Side, the WIFI CHip receives 500 Packages and needs to process the packages to the CPU (over SPI)

haeferer commented 7 years ago

@jlkalberer : After your fix the firmwareupdate to 0.6.2 worked on your side? On my side the error occured, cause the Photon needs 5 Seconds to send the Missing Chunk Message after all Chunkmessages left the nodeJs Process? If this is still a CRC Error, why does the firmwareupdate works after your update?

jlkalberer commented 7 years ago

No, there's still bugs after my updates. I'll let you know after I spend more time digging into this.

haeferer commented 7 years ago

Hi, i've looked into the existing Flasher.js code. The CRC checksum is only calculated over the Binary content from the ByteStream for a specific chuck. The CRC does not inlude (which would be saver) the ChunkIndex, Token or previous CRC. So during a ReSend the CRC should (?) be exakt the same. One explanation could be missing or reordered Chunks (maybe lost on the interface between the chips on Photon). I will check other photons if the bug is still present

jlkalberer commented 7 years ago

@haeferer - I reverted to an older version and the flashing is working with Fast OTA turned on/off. I'm going to compare the buffers sent out between the two and I should get an idea of what's going on.

haeferer commented 7 years ago

I've tested another particle, Update was ok. The next one was strange...

The Update for the systempart (the big one) sends all packages and start waiting for Missed Chunks, but nothing happens...

14:13:51.217Z  INFO Flasher.js: Read Next Chunk (messageToken=160)
14:13:51.218Z  INFO Flasher.js: Read Next Chunk (messageToken=161)
14:13:51.219Z  INFO Flasher.js: Read Next Chunk (messageToken=162)
14:13:51.219Z  INFO Flasher.js: Read Next Chunk (messageToken=163)
14:13:51.220Z  INFO Flasher.js: Read Next Chunk (messageToken=164)
14:13:51.221Z  INFO Flasher.js: Wait for AllChunksDone
14:13:51.221Z  INFO Flasher.js: Wait for MissedChunks
14:13:54.439Z  INFO DeviceServer.js: Connected Devices (devices=6, sockets=6)
14:14:04.440Z  INFO DeviceServer.js: Connected Devices (devices=6, sockets=6)
14:14:06.266Z  INFO Flasher.js: finished waiting
14:14:06.266Z  INFO Flasher.js: Done Wait for MissedChunks
14:14:06.267Z  INFO Device.js: releasing flash ownership  (deviceID=47003b001051353338363333)
14:14:06.267Z  INFO Flasher.js: Cleanup Filestream
14:14:06.268Z  INFO Device.js: flash device finished! - sending api event (deviceID=47003b001051353338363333)

But the Device Reboots and starts over (without a package)

After some time (without interaction) - the system has tried 3 or 4 times this happens

14:14:31.789Z  INFO Flasher.js: Read Next Chunk (messageToken=156)
14:14:31.789Z  INFO Flasher.js: Read Next Chunk (messageToken=157)
14:14:31.790Z  INFO Flasher.js: Read Next Chunk (messageToken=158)
14:14:31.791Z  INFO Flasher.js: Read Next Chunk (messageToken=159)
14:14:31.791Z  INFO Flasher.js: Read Next Chunk (messageToken=160)
14:14:31.792Z  INFO Flasher.js: Read Next Chunk (messageToken=161)
14:14:31.792Z  INFO Flasher.js: Read Next Chunk (messageToken=162)
14:14:31.793Z  INFO Flasher.js: Read Next Chunk (messageToken=163)
14:14:31.794Z  INFO Flasher.js: Read Next Chunk (messageToken=164)
14:14:31.794Z  INFO Flasher.js: Wait for AllChunksDone
14:14:31.795Z  INFO Flasher.js: Wait for MissedChunks
14:14:32.558Z  WARN Flasher.js: flasher - chunk missed - recovering, Extended  (chunkIndex=418, messageId=53124, missedChunks=0)
    logInfo: {
      "cache_key": "_20",
      "deviceID": "47003b001051353338363333"
    }
14:14:32.560Z  WARN Flasher.js: Missed Chunks found... (missedChunks=1)
    mset: [
      253
    ]
14:14:32.796Z  INFO Flasher.js: Done Wait for MissedChunks
14:14:32.796Z  INFO Flasher.js: Handle Missed Chunks (missedChunks=1)
14:14:32.797Z  INFO Flasher.js: Resend Chunks (chunkSize=512)
14:14:32.797Z  INFO Flasher.js: Chunk sended (messageToken=166)
14:14:34.441Z  INFO DeviceServer.js: Connected Devices (devices=6, sockets=6)
14:14:44.441Z  INFO DeviceServer.js: Connected Devices (devices=6, sockets=6)
14:14:45.780Z  INFO DeviceServer.js: Connection (connectionID=21, deviceID=47003b001051353338363333, remoteIPAddress=::ffff:37.201.7.130)

So the SERVER gets a Missing Chunk UDP Package (after only a few seconds) answered the missing chunk and everything is fine.... (so the problem is a missing "Missing Chunk" package)