cnlohr / esp82xx

Useful ESP8266 C Environment
Other
288 stars 107 forks source link

Not saving wifi credentials #60

Open bbkiwi opened 7 years ago

bbkiwi commented 7 years ago

Hi @con-f-use and @cnlohr

I've been trying the basic example made by make project (Im in dev) with make clean erase initdefault burn burnweb the ESP8266 (nodemcu v1) comes up with an AP and in the gui I connect to my home wifi no problem. However when I reset it usually tries to connect to the AP again and I have to use the gui again. If examine top of flash 0x3FE000 with echo -ne "fr4186112\t128" | netcat -u -w 1 192.168.1.3 7878 | hexdump -C I find that the wifi credentials are not there. Occasionally it does restart at home station and I find my credentials are there and after each restart it goes to the station.

In commonservices.c I see your comment wifi_station_set_config(&stationConf); //I don't know why, doing this twice seems to make it store more reliably. Is this a manifestation of the unreliability?

I thought I had a way to get around the issue by having my wifi off when I tried to switch to my home station. It kept trying and failing, then I turned the wifi on and it seemed to connect and remember. However even with this it fails often.

Is there any testing I can do for you? Cheers, Bill

con-f-use commented 7 years ago

The problem with all your issues is, that I can't seem to reproduce them. At all. That means it's something not directly in the code. Writing the same thing twice and that helping, points to a flash issue.Letting your WiFi come up after the ESP might point to a DHCP thing (did you reserve the IP? is it used by another device?).

bbkiwi commented 7 years ago

Attached is a complete dump of flash after make erase initdefault burn burnweb. dump4b.zip Does this help? Is this the only information that persists when the ESP8266 is off?

I run Ubuntu as guest on a win 7 host. Sometimes make burn or make burnweb stops part way and I have to do it again. Also esptool.py read_flash doesn't seem to work.

However I can use esptool.py directly on the win 7 machine.

To test that make burn or make burnweb were not flashing properly, I flashed esp_init_data_default.bin, image.elf-0x00000.bin, image.elf-0x40000.bin) and page.mpfs uisng the win 7 esptool.py. The dumps produced both ways were the same.

There are other devices in the house that connect to the wifi (ipad, iphone). When the wifi comes up often the IP assigned gets changed (usually 192.168.1.2 3 4 or 5).

bbkiwi commented 7 years ago

Hi,

All this is very frustrating. I really like what you guys have done and would love to iron out this issue.

I have 2 nodemcu v1.0 and one nodemcu v0.9 (which may be faulty). I plug the nodemcu into usb.

With the Arduino IDE, I've tried the wifi sketches (completely erasing flash between each test). All work fine. I can connect to wifi station and the credentials are always remembered. I can scan and find stations.

But with the most recent commit on esp82xx/dev I can always connect to a station, but it rarely remembers. I never find any stations with scan.

I also tried with the nodemcu v0.9 and it behaves like you describe in #53.

cnlohr commented 7 years ago

Hmm... I was just using the most recent one two nights ago... I think I'll get back to this this weekend. @con-f-use will you be able to look into it sooner?

con-f-use commented 7 years ago

Seeing that I'm currently 1400m high in the Austrian Alps and will be this long weekend... probably not ;-D

bbkiwi commented 7 years ago

Hi @con-f-use Glad to see there is more to life than the ESP8266. Enjoy the mountains :-) Our NZ alps are getting very snowy now.

cnlohr commented 7 years ago

@bbkiwi Which subproject are you using? I just tried with esp8266lighthouse and had no issues. I couldn't get it to fail :-/ Are you using the ESP82xx project? If so, I can try that.

bbkiwi commented 7 years ago

Hi @cnlohr and @con-f-use

I can create and eliminate the problem at will! Charles I was getting ready to answer you last query (which is I have been using esp82xx project latest commit on dev 79ce4) when I thought I'll try it a few more times.

I got NO problems 6 times in a row (each time doing make clean initdefault burn burnweb). Every time it starts with the AP, I use my Raspberry Pi and connected via the backdoor sending cmd w1 ..., it connects to my home wifi, when I reset it reconnects!

This is what was DIFFERENT. I usually have a browser running on the Pi set to page 192.168.4.1 which starts the gui running when the Pi connects to the AP. IF the gui is running, when I switch to the home wifi network the credentials are NOT saved. IF it isn't works fine.

I get the messages below if I connect to the gui and the disconnect from the AP takes about 4 seconds. If the gui is not connected these messages don't appear and the disconnect is immediate. MFS Not found at regular address (ffffffff). MFS Found at: 00010000 404(favicon.ico)

Annotated Output.txt

I've been looking over commonservices.c more carefully and have a few questions which I will send as another issue (is this the right way to do this?)

Cheers, Bill

cnlohr commented 7 years ago

but but I have been setting my credentials with a browser... Lets treat that as a red haring, though.

For you you have captured a very, very interesting situation that I cannot reproduce. You could try changing the MPFS base address in your user.cfg to the 1M one. And seeing how it behaves. I expect you will not see any changes.

I am curious if other commands issued from the webpage could be confusing it. Commands like 'wx' 'e', 'i' or 'BL'. How weird would it be if calling 'wx' was what was causing issues for you! I think that's where my focus would be. Try to send those backend commands.

cnlohr commented 7 years ago

Dude. Thank you for all this work. Like thank you. A lot.

bbkiwi commented 7 years ago

I've done some of what you suggest. Changing MPFS base makes no difference. Command 'wx' via backend doesn't screw things up. Connecting to the gui with everything closed down (so only sending wx) and then close window, then change to station via backend and problem happens. I connected to url "192.168.1.3/xxx" which just gives file not found and I don't think this would start the socket, but then the problem happens. ... anyway late here time for bed.

bbkiwi commented 7 years ago

disabled mdns and problem still persists. Doesn't have to be fresh make, if I get AP with gpio 0, if I connect to station via the backend all is fine, if the gui had been up, then it fails to remember credentials.

cnlohr commented 7 years ago

OH NEW QUESTION: Do you scan for wifi to do it or put in the station? Also the way I do i is I select station mode, put in the SSID, password, then delete the mac address (leaving it blank) and say connect. I generally don't scan. Is that what you have been doing?

bbkiwi commented 7 years ago

Hi Scanning doesn't work for me at least when I'm connected to the station, not tried when connected to AP. I can connect to station (from AP with gui at 192.168.4.1) just like you do and it connects, but if RESET or unplug it goes back to AP again.

If I send the same info via the backdoor (with never have the gui open), it connects to station and also directly after resets and unplugging.

Off to choir practice. In a month I'm doing a solo on my LED banjo!

bbkiwi commented 7 years ago

I found some discussions about scan where there was a bug that caused the first ssid found to be missed. Since only my is strong the Arduino scan example usually finds only my wifi router AP (unless I hold the esp8266 very high and close to my window then sometimes it finds 1 or 2 others from distant neighbors).

Once the esp82xx scan option found one AP which was a weak signal from a neighbor and not my home wifi. So I thought maybe this skipping of the first ssid was happening. I started up a 2nd ESP9266 as AP and thought then I would have two strong APs and would see the second, but NO :-(

However --- On the Arduino IDE for nodemcu v1.0 I've just tried the examples from https://github.com/tzapu/WiFiManager which sets up an AP, scans and connects to a station and remembers its credentials. The scan finds my wifi with 100% signal, connects and remembers.

bbkiwi commented 7 years ago

Here are the bin files I make (commit 79ce4) and a log. Am I making the same files as you? Images.zip

bbkiwi commented 7 years ago

Using the newest esp82xx may be screwing up flash so it will no longer work with code based on earlier versions of esp82xx (those before toolchain with esp_nonos_sdk was added) :-O

I thought I'd go back to tweaking embedded8266 on the dev branch of my fork of colorchord. It uses commit 0946 of esp82xx and sdk esp_iot_sdk_v1.5.2.

I couldn't get it to run. I couldn't get esp82XX-basic to run either - constant reboots and other strange errors. I tried preping the nodemcu using make erase from esp82xx no luck! Tried make erase initdefault no luck! I thought the nodemcu was a gonner #$@#$

I was able to fixed it by first running an Arduino sketch that connects to my wifi. After that I can burn and run the colorchord/embedded8266 code and it works again ;-?

Is initdefault only for your version of esp_nonos_sdk? Maybe after running the Arduino sketch I could read 128 bytes of flash at 0x3FC000 and make an esp_init_data_another_default.bin to initialize for earlier versions of esp82xx. A total hack based on no understanding whatsoever!

cnlohr commented 7 years ago

I would do that. I am really confused what project you are making. If we can pick a specific, basic project, that would be easiest. Something like esp8266ws2812i2s. Perhaps @con-f-use 's git clone --recursive https://github.com/con-f-use/esp82XX-basic

^^ Currently trying esp82XX-basic with newest esp82xx [dev]

OH MY! I CAN RECREATE YOUR PROBLEM!!!

OK! I can work on this tomorrow. YESSSS.

And by yessss I mean noooooooo because it means it's actually a real problem.

cnlohr commented 7 years ago

OK!!! So, if you use regular esp82xx, and then you use the web gui, but you LEAVE the MAC address in the box, then click connect, does it work? I thought that would make it match the bssid, but it seems not to. It seems to work every time for me. This points to a deeper problem.

cnlohr commented 7 years ago

Or not, this is random.

bbkiwi commented 7 years ago

Hi Charles, I am using the example as in the "Start a new Project" section of README.md

mkdir my_new_esp_project cd my_new_esp_project -- here I put a link to my fork of esp82xx
cp esp82xx/Makefile.example Makefile make project

Test 1 - make erase initdefault burn burnweb connect to AP Go to gui in browser change to station, ssid, pw, blank mac (it should change) Reset ESP8266 it will revert to AP

Test 2 - make erase initdefault burn burnweb connect to AP DO NOT go to the gui, but use netcat to send w1\tssid\tpass\t\t (it should change to station) RESET and it will reconnect to the station

cnlohr commented 7 years ago

For me, it's doing it either way, seemingly randomly.

cnlohr commented 7 years ago

Sometimes switching, sometimes not.

bbkiwi commented 7 years ago

the code that uses mac address is a bit iffy - the bssid is really the 4th param and there is a defunct third param (commonservices.c line 396)

Look in my repo https://github.com/bbkiwi/esp82xx/tree/experiment and look at the changes to commonservices.c

These changes are not neccessarily fixes but I made this branch to fiddle around with the code.

cnlohr commented 7 years ago

I am thurroughly confused by thees hex files. It looks like there is some sort of confusion as to the proper location of the wifi settings. Sector 7d or sector 3fd.

cnlohr commented 7 years ago

I tried switching everything over to the newest SDK. Still no dice.

cnlohr commented 7 years ago

P.S. Newest SDK takes an extra 1264 bytes!?!? WHYY? So, it seems that if I set it multiple times, it eventually does take. I have never needed to set more than 3 times by GUI. Do you ever run into cases where you have to set it more than that?

cnlohr commented 7 years ago

So, it seems that if I set it multiple times, it eventually does take. I have never needed to set more than 3 times by GUI. Do you ever run into cases where you have to set it more than that?

bbkiwi commented 7 years ago

I dumped memory in the case it does not reconnect to the station and find wifi credentials at 7d only but in the case that is does reconnect to the station the credentials are at both 7d and 3fd. Maybe ? current wifi stored at 7d and saved wifi and 3fd.

Is the newest SDK 2.0.1? (in order for me to use it with colorchord/embedded8266 I'm forced to use 1.5.2 or 1.5.4 otherwise not enough memory)

At this point for me the GUI hardly ever works (but maybe sometimes after repeated attempts). I will test the 3 times strategy tomorrow. But I have never had the gui work immediately after using make erase initdefault

Once I have an the ESP8266 that works with the newest esp82xx, I cannot get it to work with the earlier commits fo esp82xx using 1.5.2 unless I do my trip of loading a Arduino wifi station sketch.

cnlohr commented 7 years ago

Several days ago, you said "But with the most recent commit on esp82xx/dev I can always connect to a station, but it rarely remembers. I never find any stations with scan." Can you identify a specific commit where things /did/ work first time every time? I think I'm going to try that next, but I can certainly recreate this problem now. I will also second your "never saves first time" with the current environment.

Re: SDKs: esp82xx by default uses a weird amalgam of the 2.0.1 SDK and some custom binaries. I was just trying a stab at 2.1.0.

bbkiwi commented 7 years ago

Hi Charles,

I'm glad the problem is reproducible now, I was beginning to wonder what was going on. But very bloody frustrating! I'm not sure how to find a commit where it was gone :-( for these reasons ...

I'm not sure I ever had a time when things were working correctly. When I first got the nodemcu v1.0, I was experimenting with the Arduino IDE and then your stuff. I now know that if I run a sketch that sets of wifi with Arduino it often leaves the flash in a state so it remembers the station when I use your stuff. Also I was mainly playing with colorchord/embedded8266.

I don't really know how to go back and find a commit where the problem is gone. As I mentioned 16 hours ago "Using the newest esp82xx may be screwing up flash" The only way I now how to fix flash to try earlier commits is to run an Arduino sketch first and I'm not sure what that is leaving in flash.

The problem was there at least on the commit of esp82xx that colorchord/embedded8266 was using on 27 March 2017.

That was the time I first noticed the problem when I visited my son and had difficulty switching to his wifi. I had accidentally put in the wrong password and it got stuck trying to connect.

I made some notes at the time: quesiton esp.txt One comment I made was
"tried reset to factory setting, used save and it then saved correct SSID and PW" but I take this with a grain of salt with this I've lots of red herrings :-)

cnlohr commented 7 years ago

I am really sorry, I will not be able to address this until after MAGStock is complete on Tuesday, June 13. Is an OK work around for now to to set multiple times until it takes?

bbkiwi commented 7 years ago

No problem enjoy the music. Have you set up colorchord LEDs for any of the bands there?

cnlohr commented 7 years ago

Every year we make an effort, every time we don't get it in time.

cnlohr commented 7 years ago

How upset would y'all be if I just ditch this and store the wifi credentials in our settings arena? I can't figure out what's going on with Espressif's stuff.

con-f-use commented 7 years ago

I'd be okay with that. To me it seems the problem vanished anyway. I've programmed about 40 EPSs and it has not re-occured.

cnlohr commented 7 years ago

It's been happening on all of mine, now. What is your setup?

con-f-use commented 7 years ago

Basically just a bunch of these:

https://de.aliexpress.com/item/ESP8266-ESP-12-USB-WeMos-D1-Mini-WIFI-Entwicklungsboard-D1-Mini-NodeMCU-Lua-IOT-Basis-Auf/32673300492.html

cnlohr commented 7 years ago

I meant software, SDK, application.

bbkiwi commented 7 years ago

Hi I can make the bug go away!!!

It will be interesting to see exactly what you are testing @con-f-use

I set up an experiment branch and make a few changes where the code seemed to differ from expressif documentation, but bug still persists. I also started shutting down some of the code to see if I could get the bug to disappear.

There is a clue here if I comment out from http.c the line: i = MFSOpenFile( path, &curhttp->data.filedescriptor ); and replace with i = -1; The bug goes away (of course all urls then return 404

I then left the line in and started messing with MFSOpenFile in mfs.c and deduced flashchip->chip_size = 0x01000000; and later flashchip->chip_size = 0x00080000; is the root of the problem. Must there be some time delays after each?

I have been flashing the web at 0x100000.

BUG AWAY! I used your commit 79ce49d and commented out all lines setting flashchip->chip_size in mfs.c and commonservices.c

Set up usef.cfg to flash web at 0x10000 With these mods can use gui to switch station and all works and it remembers credentials.

So why? How to handle when web at 0x100000?

bbkiwi commented 7 years ago

I tried setting flashchip->chip_size = 0x01000000; immediately before spi_flash_read and flashchip->chip_size = 0x00080000; immediately after and bug comes back.

I think this is only needed for flash bigger than 4M anyway. see: http://www.packom.org/esp8266/16mb/flash/eeprom/2016/10/14/esp8266-16mbyte-flash_handling.html

So I have commented out all flashchip->chip_size changes in mfs.c in my experiment branch. (could probably remove them everywhere they occur) Also #undefine DISABLE_MDNS

Working fine from gui now and remembering credentials (for both web at 0x10000 or 0x100000). mdns a bit iffy works in firefox, not in chrome or chromium on Rasp pi.

I have other changes which I'd like to discuss. All in experiment branch.

cnlohr commented 7 years ago

WOWZERS!!! We will have to make that fix!

bbkiwi commented 7 years ago

I'd like to discuss it a bit first. I'm busy today. Tomorrow I'll try and write up my concerns and questions.

I briefly went back to colorchord/embeded8266 last night using exp82xx commit 0946dfc with those changes and skd 1.52. (I can't use the recent exp82xx and newer sdk as not enough memory.) Ran into problems! But I think maybe that flash was set up incorrectly - usiing make erase initdefault probably is only for your version of the sdk. Now I can't get it to work with cc even with my dev branch on cc which worked before. Restore parameters is all screwed up.

king2 commented 7 years ago

Just yesterday I found this bug, now looked on issues before posting my own. Maybe my comment will be helpful to somebody.

This happens because 'flashchip' is EXTERNAL variable, it comes from SDK. Normally, we should NEVER change it.

When you are trying to change system parameters via misc wifi_set_XXX functions, SDK saves them to flash to its system param area (3 sectors before end of flash).

It looks to flashchip->chip_size, that was determined and set at start and.. wow, we have 512KB flash! SDK does not know that we was modified this variable and think that this is right value. So, SDK saves your settings to sectors 0x7D..0x7F (instead of 0x3FC, for 32mbits flash, for example).

After rebooting SDK determines chip size again (storing it to chip_size) and reads configuration (default one, from right place). OK, your settings was lost (literally, your settings was succesfully written to flash, but to wrong place, and SDK was read them from another (right) address.

If you do not made something that changes chip_size from reboot and still saved your config, it will be placed to right address (for example, if you will load webpage, enter all that you want, reboot your ESP, after re-connect you can press button that will save changes. As nothing was done with MPFS from reboot, your settings will be really saved and will be restored after reboot.

Workaround:

I have added 'uint32 flash_size_saved = flashchip->chip_size' just before setting chip_size to 0x01000000 everywhere except FlashRewriter (it reboots anyway at end of function, so we do not need to restore anything).

Then, in any place where we are assigning hardcoded value to chip_size I have placed line: flashchip->chip_size = flash_size_saved;

This should be done in several places in sources (just search for flashchip->chip_size).

After this everything started to work ok with flash of any size.

bbkiwi commented 7 years ago

Hi @king2, nice explanation. I tracked the bug down by an extremely tedious process but eventually realized it had something to with changing chip_size. Since I have a nodemcu v1.0 with 4M, simpling commenting out where it was changed 'fixed' it. At that point I found the web page http://www.packom.org/esp8266/16mb/flash/eeprom/2016/10/14/esp8266-16mbyte-flash_handling.html which outlines your fix too.

There are few things I don't understand about this.

  1. Is this only a problem for flash > 4M?
  2. For flash bigger than 4M, does the SDK initialize chip_size to 4M? and then should the large size to reset be the actual size of flash rather than 16M?
  3. I was worried that since the setting and restoring of chip_size is a bit separated in this code, if interupted, the interupt routine if accessed flash might get screwed up. Is this a concern?

I wonder if rather than have code that works for for any size chip, if the code could be shorter by using by introducing the flash size in the configuration file and compile with appropriate #defines and #if?

king2 commented 7 years ago

Wow, I spent more than 10 hours solving this problem. I think, I need to learn to use Google, not programming :)

  1. This problem (our problem linked with this project) actual for all flash chips with size != 512K, due to hardcoded size when returning chip_size value. But this trick really needed with flash chips more than 4MB (32mbits). If you have flash less or equal than 32mbits, you can just remove these lines, everything will work.

  2. I haven't tried this, but page from you link tells that SDK will not initialize chip size more than 4M. I think it needs to be tested with latest SDK and big flash.

  3. In this project almost all (except cmd_Flash) such code blocks are surrounded with EnterCritical/ExitCritical functions. But these functions are empty in user_main.c, so, yes, if you will call, for example, wifi_set_XXX in interrupt, it can be raised (if we are interrupted MFSReadSector(), for example), and in this case your system parameters will be saved to wrong place. I think this is not a big problem, as for me, any work with flash in interrupts is a big-evil-thing that can lead to even bigger problems. :)

I made my workaround just as showcase, and to get it working for me, changing minimum lines in minimum files. In real case, when @cnlohr will change sources, I think he will store real chip size in one global variable (not needed to get it each time), maybe add some code and defines to set chip size manually, check critical sections and so on.

bbkiwi commented 7 years ago

@king2 only 10 hours is fantastic. I've been spending weeks (learning quite a bit along the way). Adding printf commenting out code till the bug goes or everything breaks. Along the way I've found other suspect code which I will write up as another issue.

cnlohr commented 7 years ago

I really won't have time for a bit, any way one of you guys can make a patch request to fix the bug in mpfs? Possibly backing up the chip flash size and restoring it? Is that step even needed anymore for above-1M-mpfs?

king2 commented 7 years ago

I think that this trick needed for flash sizes more than 4M, as mentioned at page found by @bbkiwi. But nobody knows what will happen with SDK internals later :)

@bbkiwi, can you make pull request or point me what should I do to make it by myself?

cnlohr commented 7 years ago

We are referring to simply changing it back to whatever it was before the MPFS was running, right?