PhracturedBlue / ESP8266MQTTMesh

Self-assembling Mesh network built around the MQTT protocol supporting OTA
GNU General Public License v3.0
264 stars 79 forks source link

HelloWorld example disconnecting from WiFi #3

Closed Alasknnj closed 6 years ago

Alasknnj commented 7 years ago

Library versions:

Description I've been trying to get the HelloWorld example to work, only 1 node, and for that I have a mosquitto broker on my PC.

The example works fine up to the connection to the mqtt broker. Right after connecting, the subscription to IN_TOPIC and the publishings to OUT_TOPIC, executed at onMqttConnect() as well as the timed publishings are not making it to the broker. Some seconds later onWifiDisconnect() is called. After that it's just a cycle of connecting to the broker and disconnecting from the WiFi.

What I could figure out was that the setup_AP() call is causing this, more specifically the lines for setting up the node as Access point and Station.

PreferHardware commented 7 years ago

Are you sure that you are using the very latest version of the library?

A similar report led to an update 12 days ago, ie only a week before your post: https://github.com/PhracturedBlue/ESP8266MQTTMesh/issues/2

PhracturedBlue commented 7 years ago

This is a common issue, for which I have no good solution. I can't reproduce the issue on my nodes, and all fixes I have attempted have been unsuccessful. if you set the mode to WIFI_STA_AP at the beginning, the node will connect to the mqtt server, but the mesh connections will fail. I do not understand why i do not have this issue while many others do.

Alasknnj commented 7 years ago

Well, I did take a look at this issue, but it was actually about the mqtt server port if I understood it correctly. My issue is after a successful connection, the client (node) disconnects from the WiFi (apparently because of the access point and station mode).

PhracturedBlue commented 7 years ago

Yes, as I've noted, the issue is switching from STA mode to STA_AP mode. But the alternative (always stay in STA_AP mode) has other issues. I just have been unable to find a solution that works (and honestly, it works fine for me, so I can't reproduce the issue). I recommend attempting to build with the 2.4.0-dev core and see if things change. They did do some work to improve the AP/STA code there.

ibaranov-cp commented 7 years ago

Seeing a similar issue here, with the nodes (based on ESP-12-E and ESP-12-F ) both reporting:

[onMqttDisconnect] Disconnected from MQTT: 0 [onWifiDisconnect] Disconnected from Wi-Fi: MrRobot_2G because: 200\

Interestingly, the serial terminal is reporting that [setup_AP] Initialized AP as 'mesh_esp8266-4' IP '192.168.4.1'

However, wifi scanning the local area shows that APs are not getting registered under that name. Coming up instead as: AI-THINKER_0B0593

This means that the AP SSID setting is not taking, nor is it hidden as intended. It is also open, instead of using a mesh password as requested.

I think I've seen this behavior before in the main ESP8266-arduino github issues page, but can't seem to find it.

EDIT: Here it is https://github.com/esp8266/Arduino/issues/1094

PhracturedBlue commented 7 years ago

The 1st part is a real issue (that I have no solution to). When we activate the AP, we lose the connection to the station. The 2nd is a red herring. Each node has 2 wifi networks. One is the station ('AI-THINKER-0B0593' in your case). The ip address for this isn't reported to the serial I think. The other is the access-point which is 192.168.4.1. If the node can see your wireless network, it will use that. If it can't it will try to connect to any available node it can see.

PhracturedBlue commented 7 years ago

One thing that has worked in the past is to set your wifi network to be on channel 1. I don't know if it was a fluke or not, but I'd be interested in whether anyone has any luck with it. As I mentioned above, I cannot reproduce the disconnection issue on any of my nodes.

ibaranov-cp commented 7 years ago

Yep, that worked! Much better, though still disconnects the odd time. Set to channel 1, auto 20/40Mhz bandwidth on the router.

[publish] Sending: esp8266-out/mesh_esp8266-4/722323=hello from 722323 cnt: 23
[onMqttDisconnect] Disconnected from MQTT: 0
[connect_mqtt] Attempting MQTT connection (192.168.2.168:1883)...
[onMqttConnect] MQTT Connected
[match_bssid] Trying to match known BSSIDs for 2E:3A:E8:0B:05:93
[setup_AP] Initialized AP as 'mesh_esp8266-4'  IP '192.168.4.1'
Subscribe acknowledged.
  packetId: 1
  qos: 0
[publish] Sending: esp8266-out/mesh_esp8266-4/722323=hello from 722323 cnt: 24
[publish] Sending: esp8266-out/mesh_esp8266-4/722323=hello from 722323 cnt: 25
[publish] Sending: esp8266-out/mesh_esp8266-4/722323=hello from 722323 cnt: 26

Both publishing and subscribing seems to work now from broker :) Moving on to mesh.

ibaranov-cp commented 7 years ago

Two clarifications on the mesh network:

1) Is there a way to disable that "AI-Thinker" station to reduce clutter, if it is not being used by your mesh? 2) How do we interact between nodes on the hidden mesh? I can't really tell if messages are getting relayed through the wifi network of the broker, or through the mesh itself.

Thanks again !

Alasknnj commented 7 years ago

I've noticed that at the call to set the SoftAP, WiFi.channel() seems to return 1, regardless of the WiFi network channel it is connected to. I've tried to hardcode this call to my specific network channel to check if that was the problem, and now it seems to work for one of the node I was having trouble with. When I tried with a second node, I've observed the same issue, both with and without the first node connected.

Alasknnj commented 7 years ago

After using espressif8266_stage version of the espressif8266 platform, I've not encountered the same issue with both nodes anymore

PhracturedBlue commented 7 years ago

@ibaranov-cp You should not see any of the nodes when searching the wifi (well, you'll see hidden networks if you have that enabled...not much I can do about that). Each node opens its own AP, but it is marked as hidden so as not to pollute the list of available networks. On your router, you will see each node that acts as a gateway (gateways are preferred over relays since they are faster...so if your node is in rage of the router, it will attempt to use it). I can't hide those nodes, since it is your router that keeps track of available nodes. If this isn't what you are seeing, then something is wrong.

Nodes (generally) can't communicate between themselves. All requests are forwarded to the broker then rebroadcast to all nodes as appropriate. This is primarily because nodes have no idea of the path between themselves. Also any given node only knows about the nodes it is directly connected to, not all of its grandparents or grandchildren. This mesh network is not designed for high-bandwidth communication between nodes, and will likely not perform well if used that way. It was primarily developed as a way to provide long distance communication, allowing nodes to use themselves as local repeaters without the need for dedicated wifi repeaters.

ibaranov-cp commented 7 years ago

Ok, would this be accurate then? (Working on a readme fork for the project, I'll submit a pull request soonish :)

In the above, even though Node 1 and Node 2 see each other, the message still goes through the wifi router. However, as Node 3 is also subscribed to the message, but outside the router's range, it gets it re-broadcast from Node 2.

Does this mean that every node will always re-broadcast whatever it sees? You mention in another post a topic that has info on network structure, might be interesting to graph that.

I'll check the status of hidden networks again this evening, but they were not listed as hidden IIRC on my wifi scanner.

PhracturedBlue commented 7 years ago

Yes that is an accurate description. All nodes will subscribe to the same message prefix (by default 'esp8266-in/') and will broadcast all messages they see to all their children. Each node will decide if the message is relevant to them and act accordingly.

Nodes have the following info about other nodes: 1) a list of every other node's bssid. This is used to identify hidden nodes on the network. Every node also gets a unique subdomain (192.168.x.0) though there isn't really any need for these to be unique. 2) the ip address of the upstream (closest to broker) node they are directly connected to 3) the ip address of the (up to 4) nodes that have connected to this node's AP There is no concept of a network topology graph in the mesh. The only caching done is that when a node requests a list of other nodes (when it initiates a connection), the AP node will intercept that request, and deliver it's own list of bssids without forwarding that request all the way to the broker and spamming every node with the response.

The AP is setup as: WiFi.softAP(mySSID, mesh_password, WiFi.channel(), 1); that last '1' turns on hidden mode. If it isn't wokring, then it is probably a library issue rather than something I can resolve. On the other hand you mentioned the node was 'AI-Thinker' That isn't a name that the Mesh creates, so it seems likely that is either the station-side name (which we don't override) or something unrelated to the mesh.

PhracturedBlue commented 6 years ago

I have included the image from ibaranov-cp in the latest documentation. Note that with the 1.0 release, nodes no longer need to store information about other nodes, and the mesh will self-assemble wthout each node first having to connect to the broker.