eclipse-californium / californium

CoAP/DTLS Java Implementation
https://www.eclipse.org/californium/
Other
729 stars 367 forks source link

observation getting removed after reaching retransmission limit #451

Closed charybr closed 7 years ago

charybr commented 7 years ago

Scenario:

  1. Retransmission count set to 8. (MAX_RETRANSMIT=8).
  2. Get observe is active (Token=9953ef769546f8bb) and data is being received in blockwise transfers.
  3. At 11:49:38, for one blockwise transfer (Token=dddaf89f20959df1), getting block-1 started failing.
  4. Retries started happening in exponential backoff manner between 11:49:38 to 12:08:43 for blockwise transfer with token dddaf89f20959df1.
  5. Between 11:49 to 12:08, there were 6 new data notification from same server (Get-obs Token=9953ef769546f8bb). Blockwise transfer happened successfully.
  6. At 12:08:43, for blockwise transfer with token dddaf89f20959df1 reached retransmission limit and observation with token 9953ef769546f8bb was removed.
  7. MaxAge was 1800, get-obs with same token was re-registered and data started coming after a gap of 30 mins.

Questions:

  1. Because of one blockwise failure, observe got deleted (though other blockwise transfers were successful). Since get-obs got removed, there was no data for 30mins till re-register. Is this fine?
  2. Should we disable transparent blockwise i.e set MAX_RESOURCE_BODY_SIZE to 0 ? So that only one blockwise is active per-getobs.
boaks commented 7 years ago

Are both sides, client and server, californium? Which branch/commit? Is this bug repeatable? Do you have log? wireshark?

I'm still wondering, why issues are create without that information!

What makes me really wondering in you case is: if the communication works (5. 6 new data notification ...), why is the get in 3. still not responded?

charybr commented 7 years ago

Thanks! for responding.

Client side is californium. Server side it is coap stack on tinyos. We are using Californium 2.0.0-snapshot.

Californium log file attached. californium.0.0.log.zip

Max retransmit exceeded for blockwise transfer with token dddaf89f20959df1 between 11:49 to 12:08. During this time there were 7 new data notification for which blockwise transfer was successful.

  1. MID= 4175, Token=aaa15a5093a430c1
  2. MID= 4179, Token=39242e3bab4aeb69
  3. MID= 4185, Token=46b1d8acaf352f27
  4. MID= 4191, Token=f847e911711af4a0
  5. MID= 4197, Token=dc78447df42b9702
  6. MID= 4203, Token=345163e46fc3a5c7
  7. MID= 4209, Token=1e283ad9848f258c

Get in 3, were responded from server side (able to see them on there logs). Looks like those udp packets didnt reach client.

boaks commented 7 years ago

If possible, could you retest your issue with the bugfix? (You could fetch the bugfix by adding https://github.com/bsinno/californium to your remotes and then checkout the "fix_451" branch.)

charybr commented 7 years ago

Thanks very much! will test and get back.

charybr commented 7 years ago

Tested bugfix of 451, in both cases of retransmit limit exceeded and block2 timeout observe is not removed.

Please let us know when this fix is merged to 2.0.0 branch. Thanks!

Logs from testing: We tried with MAX_RETRANSMIT set to 5. Rcvd 1 data Oct 26, 2017 12:09:05 PM org.eclipse.californium.core.network.stack.BlockwiseLayer handleBlock2Response FINE: assembled response: CON-2.05 MID= 9863, Token=01a87704900facb1, OptionSet={"ETag":0x02, "Observe":7, "Content-Format":"application/cbor", "Max-Age":1800},

Blockwise transfer started at 12:10:57. Requesting block-1 succeeded: Oct 26, 2017 12:10:57 PM org.eclipse.californium.core.network.stack.BlockwiseLayer handleBlock2Response FINER: requesting next Block2 [num=1]: CON-GET MID= -1, Token=null, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=1)"}, no payload

Requesting block-4 started: Oct 26, 2017 12:10:59 PM org.eclipse.californium.core.network.stack.BlockwiseLayer handleBlock2Response FINER: requesting next Block2 [num=4]: CON-GET MID= -1, Token=9fff51e28ea20f46, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=4)"}, no payload

After ACK_TIMEOUT of 2s (time since blockwise started 4s) Oct 26, 2017 12:11:01 PM org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask run FINER: Timeout: retransmit message, failed: 1, message: CON-GET MID= 1905, Token=9fff51e28ea20f46, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=4)"}, no payload Oct 26, 2017 12:11:01 PM org.eclipse.californium.core.network.stack.ReliabilityLayer sendRequest FINER: Send request, failed transmissions: 1

After 5s, next attempt (time since blockwise started 9s) Oct 26, 2017 12:11:06 PM org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask run FINER: Timeout: retransmit message, failed: 2, message: CON-GET MID= 1905, Token=9fff51e28ea20f46, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=4)"}, no payload Oct 26, 2017 12:11:06 PM org.eclipse.californium.core.network.stack.ReliabilityLayer sendRequest FINER: Send request, failed transmissions: 2

After 10s, (time since blockwise started 19s) Oct 26, 2017 12:11:16 PM org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask run FINER: Timeout: retransmit message, failed: 3, message: CON-GET MID= 1905, Token=9fff51e28ea20f46, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=4)"}, no payload Oct 26, 2017 12:11:16 PM org.eclipse.californium.core.network.stack.ReliabilityLayer sendRequest FINER: Send request, failed transmissions: 3

After 19s, (time since blockwise started 38s) Oct 26, 2017 12:11:35 PM org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask run FINER: Timeout: retransmit message, failed: 4, message: CON-GET MID= 1905, Token=9fff51e28ea20f46, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=4)"}, no payload Oct 26, 2017 12:11:35 PM org.eclipse.californium.core.network.stack.ReliabilityLayer sendRequest FINER: Send request, failed transmissions: 4

After 39s, (time since blockwise started 77s) Oct 26, 2017 12:12:14 PM org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask run FINER: Timeout: retransmit message, failed: 5, message: CON-GET MID= 1905, Token=9fff51e28ea20f46, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=4)"}, no payload Oct 26, 2017 12:12:14 PM org.eclipse.californium.core.network.stack.ReliabilityLayer sendRequest FINER: Send request, failed transmissions: 5

After 78s, (time since blockwise started 155s) Oct 26, 2017 12:13:32 PM org.eclipse.californium.core.network.stack.ReliabilityLayer$RetransmissionTask run FINE: Timeout: retransmission limit reached, exchange failed, message: CON-GET MID= 1905, Token=9fff51e28ea20f46, OptionSet={"ETag":0x03, "Uri-Path":"weighsens", "Block2":"(szx=1/32, m=false, num=4)"}, no payload Oct 26, 2017 12:13:32 PM org.eclipse.californium.core.network.InMemoryMessageExchangeStore remove FINE: removing exchange for MID KeyMID[1905, fec00000000000000000000000000002:5683] Oct 26, 2017 12:13:32 PM org.eclipse.californium.core.network.InMemoryMessageExchangeStore remove FINE: removing exchange for token KeyToken[9fff51e28ea20f46, fec00000000000000000000000000002:5683] Oct 26, 2017 12:13:32 PM org.eclipse.californium.core.network.UdpMatcher$ExchangeObserverImpl completed FINER: Exchange [KeyToken[01a87704900facb1, fec00000000000000000000000000002:5683], origin: LOCAL] completed Oct 26, 2017 12:13:32 PM org.eclipse.californium.core.network.stack.BlockwiseLayer clearBlock2Status FINE: removing block2 tracker [KeyUri[coap://[fec0:0:0:0:0:0:0:2]/weighsens[03], fec00000000000000000000000000002:5683]], block2 transfers still in progress: 0

This block2 transfer timed out corresponds to blockwise started at 12:09:05. Oct 26, 2017 12:14:05 PM org.eclipse.californium.core.network.stack.BlockwiseLayer$5 run FINE: block2 transfer timed out: KeyUri[coap://[fec0:0:0:0:0:0:0:2]/weighsens[02], fec00000000000000000000000000002:5683] Oct 26, 2017 12:14:05 PM org.eclipse.californium.core.network.UdpMatcher$ExchangeObserverImpl completed FINER: Exchange [KeyToken[01a87704900facb1, fec00000000000000000000000000002:5683], origin: LOCAL] completed Oct 26, 2017 12:14:10 PM org.eclipse.californium.core.network.deduplication.SweepDeduplicator$SweepAlgorithm run FINEST: Start Mark-And-Sweep with 1 entries

This block2 transfer timed out corresponds to blockwise started at 12:10:57. Oct 26, 2017 12:15:59 PM org.eclipse.californium.core.network.stack.BlockwiseLayer$5 run FINE: block2 transfer timed out: KeyUri[coap://[fec0:0:0:0:0:0:0:2]/weighsens[03], fec00000000000000000000000000002:5683] Oct 26, 2017 12:15:59 PM org.eclipse.californium.core.network.UdpMatcher$ExchangeObserverImpl completed FINER: Exchange [KeyToken[01a87704900facb1, fec00000000000000000000000000002:5683], origin: LOCAL] completed

boaks commented 7 years ago

The PR is merged.

charybr commented 7 years ago

Thanks very much!