arduino-libraries / WiFi101

Wifi library for the Arduino WiFi 101 Shield
158 stars 128 forks source link

v0.11.0 Causes hangs on MKR1000 #117

Closed youcangetme closed 7 years ago

youcangetme commented 7 years ago

Just updated and it took me a while to narrow down which lib was causing the freezing.

I'm a very high level developer so I apologize for not being specific but I can revert to v0.10.0 and it will work again.

Specifically, if I hard code the network SSID and password it runs flawlessly. If I use the FlashStorage lib to load the data it will freeze. Oddly enough the Arduino IDE Serial Monitor will hang (as in I click to close and it will not do so) too until I power off the MKR1000. The only way to power on and flash the memory again is to disconnect power to all external components before booting. If even one I2C or SIP module is connected I can't get the MKR1000 to flash.

I discovered which lib it was by backing out each library until it worked again. I can have all libs up to date and just the WiFi101 at v0.10.0 and it will all work. Update to v0.11.0 and it fails.

It does seem to connect to the network and obtain a IP address. It seems to fail around the use of the multicast socket client to listen for a broadcast packet. Sometimes it receives the packet and hangs and other times it hangs before the pack is received.

Sorry I can't be more specific.

sandeepmistry commented 7 years ago

Hi @youcangetme,

Before anything can be done to resolve this issue, more details are needed.

Could you please provide a minimal sketch to reproduce the issue you described, along with a detailed sequence of steps to follow.

youcangetme commented 7 years ago

I will try to get a stripped down version of the app into a sketch for you. Give me a bit as the app is complex being that is has several modes of operation.

youcangetme commented 7 years ago

I simply don't have the time to simplify the app. I instead shared it with you as read only via an invite; https://github.com/subnoize/dstlr-mkr1000/invitations

It starts up in AP mode and you navigate to a page and enter your SSID, password and the email address is recorded but not used. Once that is set, reboot and the MKR1000 should join the network. If you navigate to the unit via HTTP it should send some data about IP, MAC addrs. If you can't get it to show that data then its having trouble. If you need the server that is generating the beacon I can give you access to that code as well. Eventually this will be an open source project but I would rather not put in the public view until I have it a bit more polished.

You can comment out the Adafruit I2C sensors. It doesn't require them to operate.

sandeepmistry commented 7 years ago

No rush on simplifying the app, let us know when it's done and can be shared publicly ...

It seems to fail around the use of the multicast socket client to listen for a broadcast packet. Sometimes it receives the packet and hangs and other times it hangs before the pack is received.

How large is the packet? Over 1400 bytes? Can you please try the patch in #116 if that's the case ...

youcangetme commented 7 years ago

I try to keep UDP packets sub 64 bytes. The buffer is exactly 64 bytes on the server side. Basically the time in seconds from epoch and the 4 digit port number the master is listening on. Nothing special beyond that. The beacon is broadcast about every 30 seconds. Its makes the IoT devices self configuring once you give them the SSID and password and they join that access point.

bcraigie commented 7 years ago

+1 from me. I am using data from an FRAM rather than Flash, and it simply loads the data into a Struct which has the SSID defined like this:

struct config_t
{
  char SSID1[17];
  char SSIDPW1[33];
  char SSID2[17];   
  char SSIDPW2[33]; 
  char SSID3[17];   
  char SSIDPW3[33]; 
  unsigned int myChannelNumber;
  char myWriteAPIKey[17];
  char myIP[16];
  unsigned int myPort;
  unsigned int myFrequency;

} config;

and the connect statement is just like:

myStatus = WiFi.begin(config.SSID1, config.SSIDPW1);

I've defined char[17] for the ssid so that it allows 16 characters plus the null for the ssid.

So I don't know if it is just a red herring that hardcoding the SSID fixes it? Presumably the Wifi100 library can handle the SSID from a char[17] and password from char[33] without a problem?

Library version 0.11.1 was giving me immense grief with hanging a few seconds after the begin statement or on the first data transmission thereafter.

Reverting back to 0.10.0 sorted it all again.

If I can be of any help, just let me know. I dumped out the values from config.SSID1 and found that there were remnant characters from a previously stored SSID after the null terminator of the current SSID

For example if the SSID is "myssid" and previously it had been storing : "mylongssid"

The dump of config.SSID1[17] shows: m y s s i d [null] s s i d [null][null][null][null][null][null]

Would this cause the WiFi.begin a problem? I can't see in the documentation a maximum size for the ssid or password to be passed to WiFi.Begin although I might simply have missed it.

sandeepmistry commented 7 years ago

Hi @youcangetme,

Please open a new issue with more details like a minimal sketch to reproduce the issue. Your issue seems to be unrelated to the original issue reported here.

youcangetme commented 7 years ago

@bcraigie if you could come up with a "proof of operation" sketch for @sandeepmistry I will owe you! I've been buried and haven't had a chance to get an example running. You are experiencing the exact issue I am even though its from different libraries. No need to create a separate ticket.

bcraigie commented 7 years ago

I will endeavour to do so. Since you have the problem with 0.11.0 it is likely that I would also see the problem with 0.11.0 - I didn't try that version, and just reverted from 0.11.1 to 0.10.0 so as you say, we probably do indeed have the very same problem.

What I can say is that I developed my code with a hard-coded SSID and Password initially and it all worked great. Only when I introduced saving the SSID and password to the F-RAM and loading that to pass to the WiFi.Begin did I start seeing the hanging, and even then only after having changed the stored SSID from a longer SSID to a shorter SSID. So at present, I'm working on the assumption that the WiFi101 0.11.x library has a problem with the way the ssid and/or password is passed to it, and probably the extraneous characters after the null.

Because the problem only shows up when we read from Flash or F-Ram, a minimal sketch might not be so minimal :-)

sandeepmistry commented 7 years ago

Because the problem only shows up when we read from Flash or F-Ram, a minimal sketch might not be so minimal :-)

Maybe we can start with a non-minimal sketch? It's better than having no sketch to look at ...

sandeepmistry commented 7 years ago

@youcangetme @bcraigie there's a new v0.11.2 release with a fix for WiFi.RSSI() crashing. Could you please test again with this version?

bcraigie commented 7 years ago

My Sketch runs perfectly on 0.10.0 but I'm sorry to say that on 0.11.2 (and 0.11.1 and 0.11.0) it hangs the Arduino.

bcraigie commented 7 years ago

This page would not let me upload zip files or .ino sketches, so I've put my .ino file at this Dropbox link. This is my full project file because I'm not sure if I can strip it all down to a minimal sketch and still get it to hang (maybe there's some interplay between libraries).

It also requires an extended F-RAM library which I've updated (so that it will read and write the whole struct in one go).

I am also using Thingspeak library 1.2.1 and DHT Sensor Library 1.3.0

Also, here is the Fritzing breadboard layout.

Hope that's ok.

sandeepmistry commented 7 years ago

Thanks @bcraigie, I've attached the sketch, in case the dropbox link goes down in the future:

TempHumWifi_Mon_MKR1000.ino.txt

sandeepmistry commented 7 years ago

Here's an example of somethings you can try to remove to isolate the problem:

1)

  startWiFiModule();                            // start it up
  maxSSID = scanNetworks(false); // scan silently;
  wifiStatus = connectWiFiModule();                          // Try to connect to local wifi

Is there a reason you always need to scan before connecting?

2) Use Serial1 instead of BTSerial

3) Remove the code to connect to ThinkSpeak

bcraigie commented 7 years ago

Thank you. I appreciate your advice and guidance,

I monitor the status of the hardware by Bluetooth, hence the BTSerial. Serial1 is only useful if the hardware is attached to a PC and this monitor is designed to work wirelessly - the only wire being the power connection.

Yes, I need to scan the Wifi every time I turn on the device. We cannot assume that the wireless network is always present, and we cannot assume the device will always be in the same location. It also has to scan the WiFi once in order to produce the Bluetooth menu. Logic dictates that we should not blindly try to connect to a WiFi network that isn't there.

Currently I use the Android app "Bluetooth Terminal HC-05". Ultimately, I will write an Android app to monitor and control the Arduino over Bluetooth.

Regardless, neither of these should cause it to hard hang? :-)

The code to connect to Thingspeak is vital to the project. Without that I might as well throw the Arduino in the bin? The Thingspeak call is the only internet access the hardware has once connected to the WiFi network.

The main point is, it works with WiFi101 version 0.10.0 (although it did hang once today) and it immediately hangs on 0.11.x so, what changed in the library that causes it to hang? If my calls to the library functions are correct, and if the library code is correct, it should not be possible to hang the Arduino. Since my calls to the library functions appear to be correct (as far as the limited documentation goes), one can only surmise something is wrong with the WiFi101 library code?

Thanks again for looking at this. :-)

Brian

sandeepmistry commented 7 years ago

Regardless, neither of these should cause it to hard hang? :-)

Correct, but it would be great to narrow down the code that is causing the hang

The code to connect to Thingspeak is vital to the project. Without that I might as well throw the Arduino in the bin? The Thingspeak call is the only internet access the hardware has once connected to the WiFi network.

I was just wondering if something with that code was causing it to crash. Basically via a process of elimination, we can narrow down where the problem is in the WiFi101 library ...

The main point is, it works with WiFi101 version 0.10.0 (although it did hang once today) and it immediately hangs on 0.11.x so, what changed in the library that causes it to hang? If my calls to the library functions are correct, and if the library code is correct, it should not be possible to hang the Arduino. Since my calls to the library functions appear to be correct (as far as the limited documentation goes), one can only surmise something is wrong with the WiFi101 library code?

I agree, the git commit history has a record of all the changes between v0.10.0 and v0.11.x. If I had the same setup to test with as you, I could run a git bisect to track down the change where the problem was introduced.

Here's a link to all the changes: https://github.com/arduino-libraries/wifi101/compare/0.10.0...master - if you click the "Files change" tab, you can see the code that changed.

sandeepmistry commented 7 years ago

I might have spotted the item, stayed tuned for a patch to try ;)

sandeepmistry commented 7 years ago

@bcraigie can you please try out the change I proposed in pull request #123?

bcraigie commented 7 years ago

Hello @sandeepmistry :-)

I tried the patch, but it still hangs. It is rather strange, like a memory/stack corruption of some sort.

The debug statements I inserted show me it reaches line 298:

wifiStatus = connectWiFiModule();

And inside there it gets as far as lines 415 to 423 (I un-commented line 421):

myStatus = WiFi.begin(config.SSID1, config.SSIDPW1);
        if ( myStatus == WL_CONNECTED) {
          BTSerial.print("Connected to SSID1: ");
          BTSerial.println(config.SSID1);          

          delay(10000);
          BTSerial.println("Should be connected by now");
          return myStatus;
        }

And it prints the message "Should be connected now" on the Bluetooth terminal. The next line is:

return myStatus;

Which, if I'm not mistaken should exit immediately out of the for loop and the connectWiFiModule function and return to the next line in the setup() which should be 300, but it does not print either of the debug lines:

if (WiFi.status() == WL_CONNECTED) {
    BTSerial.println("Connecting to Thingspeak server...");
    ThingSpeak.begin(client, config.myIP, config.myPort);
  } else {
    BTSerial.println("WiFi not connnected. Please update the SSID/Password");
  }

Either the if or the else clause should print something, but it hangs before it gets there (or maybe the WiFi.status call hangs?).

It looks like something is corrupting the return address (is something left on a stack/heap that should not be there?)

I'm going back to the old 6502 programming days, but in those days when you called a subroutine, you would push the address of the current statement onto the stack, call the subroutine, and at the return, you would pull the return address back off and increment it, but if something else has been added to the stack in between times and not been removed again, then the return address would be wrong?

Ok, now the next bit of the story is weird: So, I added another line to debug. In the setup() At 299 I added: BTSerial.println("Returned from connect"); It printed that, but not only that, it proceeded to print "Connecting to thingspeak server..." and hung later on in the program. However, I pressed the reset button on the Arduino, and the next time it hung after my new debug statement "Returned from connect". So the hang point appears to be random.

Is it possible that an interrupt is messing with the stack? But why would this not be a problem in 0.10.0 and yet it is a problem in 0.11.x ? It is strange.

Thanks again for your help. It is greatly appreciated. :-)

Brian

bcraigie commented 7 years ago

Further debugging...

WiFi.status() appears to be one cause of the hard hang.

Using the return value wifiStatus of wifiStatus = connectWiFiModule(); instead of using WiFi.Status() allows the code to go further. It gets to the point where it displays the temperature and humidity, but then hangs at:

postStat = ThingSpeak.writeFields(config.myChannelNumber, config.myWriteAPIKey);

At least that is some progress. Perhaps if we can identify why WiFi.Status() causes a hard hang, it might also solve why it hangs when we come to send the data.

I used your technique @sandeepmistry and changed

uint8_t WiFiClass::status()
{
    if (!_init) {
        init();
    }

    m2m_wifi_handle_events(NULL);
        return _status;
}

to

uint8_t WiFiClass::status()
{
    if (!_init) {
        init();
    }

    m2m_wifi_handle_events(NULL);
        uint8_t myWStat = _status;
    return myWStat;
}

and this permitted me to return to using WiFi.status() without it hanging at that point, but it still hangs at the send to Thingspeak. Still investigating but it does look like there is some memory being overwritten somewhere...

youcangetme commented 7 years ago

@bcraigie I want to thank you for following up with this issue. I haven't had the time with this release dumping into QA this week and deployment to production on the 26th. I have turned off all updates until then. Truly, thank you!

sandeepmistry commented 7 years ago

@bcraigie I've pushed another commit to PR #123: https://github.com/arduino-libraries/WiFi101/pull/123/commits/baca11e15efa9d8f06414695b2899f2b81ee346e

Please try it out, there were a few other cases with the _resolve var wasn't reset to zero. So when the NTP was synced after connecting to an AP after scanning it would trigger the crash.

sandeepmistry commented 7 years ago

I'm going to close this for now, I think all the changes in PR #123 solve @bcraigie's issues. @agdl has also merged this PR.

If something is still crashing we can re-open this issue.

bcraigie commented 7 years ago

Thank you @sandeepmistry

Aha! So downloading the latest zip file from here works without hanging. Will this become 0.11.3 at some point?

Thanks!

Brina

youcangetme commented 7 years ago

I can confirm it still hangs on my project as well.

On Dec 30, 2016 7:34 AM, "Brian Craigie" notifications@github.com wrote:

Thank you @sandeepmistry https://github.com/sandeepmistry

Apologies, but 0.11.2 still hangs with my setup.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/WiFi101/issues/117#issuecomment-269766871, or mute the thread https://github.com/notifications/unsubscribe-auth/ANpG450IrPR6doau3mwQ7HJtoCEigzyiks5rNPpfgaJpZM4K9XUX .

youcangetme commented 7 years ago

I can confirm it still hangs on my project as well.

On Dec 30, 2016 7:34 AM, "Brian Craigie" notifications@github.com wrote:

Thank you @sandeepmistry https://github.com/sandeepmistry

Apologies, but 0.11.2 still hangs with my setup.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/WiFi101/issues/117#issuecomment-269766871, or mute the thread https://github.com/notifications/unsubscribe-auth/ANpG450IrPR6doau3mwQ7HJtoCEigzyiks5rNPpfgaJpZM4K9XUX .

sandeepmistry commented 7 years ago

@bcraigie great, thanks for trying the latest!

Aha! So downloading the latest zip file from here works without hanging. Will this become 0.11.3 at some point?

Yes, we'll be releasing a new version in the future.

I can confirm it still hangs on my project as well.

@youcangetme are you using the latest changes from master? Please provide a sketch that can be used to recreate the crash if the master version still fails.

youcangetme commented 7 years ago

Ok, let manually import the libs and try again.

On Dec 30, 2016 8:59 AM, "Sandeep Mistry" notifications@github.com wrote:

@bcraigie https://github.com/bcraigie great, thanks for trying the latest!

Aha! So downloading the latest zip file from here works without hanging. Will this become 0.11.3 at some point?

Yes, we'll be releasing a new version in the future.

I can confirm it still hangs on my project as well.

@youcangetme https://github.com/youcangetme are you using the latest changes from master? Please provide a sketch that can be used to recreate the crash if the master version still fails.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/WiFi101/issues/117#issuecomment-269774558, or mute the thread https://github.com/notifications/unsubscribe-auth/ANpG470MeFYiWkx3BgbmcyOl2ROePHyiks5rNQ5RgaJpZM4K9XUX .

sandeepmistry commented 7 years ago

v0.12.0 has been released today with the latest changes from master, which includes the fix.