RIOT-OS / Release-Specs

Specification for RIOT releases and corresponding test configurations
4 stars 21 forks source link

Release 2019.04 - RC1 #113

Closed danpetry closed 5 years ago

danpetry commented 5 years ago

This issue is for discussion related to testing and bugfixing of Release Candidate 1 of the 2019.04 release.

To track testing, please use the tracking spreadsheet, which contains the list of tests and a column for your initials if you're testing something, and pass/fail status for your test. This should hopefully be a better way of capturing information than a checklist and inline conversation, let me know if it's not.

https://docs.google.com/spreadsheets/d/18k5rijHnoFEC3AZA5Ur3_nRjwban-5na6R2bkDu2Jac/edit?usp=sharing

There is a "test failures and bugfixes" tab in the spreadsheet, which is there to capture the progress of bugfixes coming out of the tests. This is optional for you, as I'll keep it updated, but hopefully will be also useful.

There are also folders for you to drop test artefacts in, if there are any:

https://drive.google.com/drive/folders/14EwQPQM7zN0Go5SJHBtIOBF6qN8L7vig?usp=sharing

miri64 commented 5 years ago

Why is 1.1 marked as failed? As far as I can see the linked Murdock output only fails on the tests for tests/pkg_c25519 but not on the compiling of any of those applications.

danpetry commented 5 years ago

Ok thank you for the clarification!

jia200x commented 5 years ago

Test 11.3 (LoRaWAN abp) fails. It freezes:

main(): This is RIOT! (Version: 2019.07-devel-HEAD)
All up, running the shell now
> loramac set nwkskey B74B805FAFC3E10B81E7A9015E67B43C
loramac set nwkskey B74B805FAFC3E10B81E7A9015E67B43C
> loramac set appskey 59A6C0AA0870E53CCAD8E4EB7647D10C
loramac set appskey 59A6C0AA0870E53CCAD8E4EB7647D10C
> loramac set rx2_dr 3
loramac set rx2_dr 3
> loramac join abp
loramac join abp
Join procedure succeeded!
> loramac tx hola
loramac tx hola
help

/* Crickets... */

EDIT: I will check what's going on there

miri64 commented 5 years ago

Meta-comment: While I think that the spreadsheet could help with the syncing problem we faced in the past, it is a bit impractical in the regard that it doesn't provide links to the tasks, as the checklist did.

miri64 commented 5 years ago

I'm investigating why 4.3 is failing

miri64 commented 5 years ago

@danpetry regarding 4.7-8: did you compile the arduino-zero application with USEMODULE=xbee?

danpetry commented 5 years ago

@miri64 I used the automated scripts, haven't checked further yet

miri64 commented 5 years ago

I'm investigating why 4.3 is failing

https://github.com/RIOT-OS/RIOT/pull/9523 introduced the regression. I'm trying to find out why later.

danpetry commented 5 years ago

Meta-comment: While I think that the spreadsheet could help with the syncing problem we faced in the past, it is a bit impractical in the regard that it doesn't provide links to the tasks, as the checklist did.

Putting hyperlinks in now. Is this adequate? Can revert back to the checklist if spreadsheet is not helping overall

miri64 commented 5 years ago

Putting hyperlinks in now. Is this adequate? Can revert back to the checklist if spreadsheet is not helping overall

At least for the problem I pointed out it does. I'll open an issue to discuss the structure / tool for organizing the testing further. This way we get the discussion out of the way of the actual testing discussion.

miri64 commented 5 years ago

Regarding 4.3

RIOT-OS/RIOT#9523 introduced the regression. I'm trying to find out why later.

The problem is that with https://github.com/RIOT-OS/RIOT/pull/9523 ping is now asynchronous. So instead of sending packets in an interval [RTT of previous packet] + [given interval] we now burst out the packets every 100ms pretty much precisely. However, the round-trip time of a 1KB packet in 6LoWPAN is ~140ms, so we are already filling up the packet buffer with a second packet while we still wait for the reply from the last (maybe even a third, I did not analyze it that deeply). Because of that the packet buffer fills up, resulting in not enough space for the reply. If I change the test parameters to 500 packets every 200ms, it works. So I'd say given the use case the test specification is broken, we just did not realize since the implementation used to be wrong as well (obviously 100ms + RTT are not 100ms ;-)).

jia200x commented 5 years ago

Ok, LoRaWAN ABP test passes. But, the devaddr was wrong and the MAC layer tried to retry on DR0. This means, the MAC layer was blocked for ~ (50* NUM_OF_RETRANS) seconds, which is quite a lot. I think we should slowly try to make it asynchronous (something similar to https://github.com/RIOT-OS/RIOT/pull/11022)

miri64 commented 5 years ago

At least for the problem I pointed out it does. I'll open an issue to discuss the structure / tool for organizing the testing further. This way we get the discussion out of the way of the actual testing discussion.

See #120

miri64 commented 5 years ago

The problem is that with RIOT-OS/RIOT#9523 ping is now asynchronous. So instead of sending packets in an interval [RTT of previous packet] + [given interval] we now burst out the packets every 100ms pretty much precisely. However, the round-trip time of a 1KB packet in 6LoWPAN is ~140ms, so we are already filling up the packet buffer with a second packet while we still wait for the reply from the last (maybe even a third, I did not analyze it that deeply). Because of that the packet buffer fills up, resulting in not enough space for the reply. If I change the test parameters to 500 packets every 200ms, it works. So I'd say given the use case the test specification is broken, we just did not realize since the implementation used to be wrong as well (obviously 100ms + RTT are not 100ms ;-)).

Mhhh... the packet buffer doesn't seem to be the problem after all :-/ I still don't get any replies if make it bigger and it isn't even filled up 50% (note position of last used byte):

1554900331.956012;m3-101;> pktbuf
1554900331.956204;m3-101;packet buffer: first byte: 0x20001b08, last byte: 0x20004acc (size: 12228)
1554900331.957131;m3-100;packet buffer: first byte: 0x20001b08, last byte: 0x20004acc (size: 12228)
1554900331.957323;m3-100;  position of last byte used: 5832
1554900331.957611;m3-101;  position of last byte used: 5728
1554900331.958410;m3-100;~ unused: 0x20001b08 (next: 0, size: 12228) ~
1554900331.958703;m3-101;~ unused: 0x20001b08 (next: 0, size: 12228) ~
danpetry commented 5 years ago

@danpetry regarding 4.7-8: did you compile the arduino-zero application with USEMODULE=xbee?

Yes, this module is being compiled

miri64 commented 5 years ago

It seems to be a resource problem after all. The reassembly buffer just runs full on both ends + the at86rftxx driver seems to have problems to receive everything thrown at it (handling both messages sent and received at once). When just sending UDP packets (that do not solicit a reply) at the same interval and size, I everything is fine. So since the new ping6 command is actually not broken, but the system just can't handle the newer, faster way of pinging another node, I'd still suggest, that we change the test parameters to something more careful. The test is about testing the fragmentation after all not about how fast the fragmentation can work under stress.

kb2ma commented 5 years ago

No issues with CoAP tests.

miri64 commented 5 years ago

No issue in Task 10.

danpetry commented 5 years ago

Closing in favor of #124