Open nickper opened 5 years ago
You never send a subscribe. I'd check the way you are trying to handle the "has the subscriber been initialized yet?"
I do, it is only hidden in a If statement. and i know that this method works because i got subscribtion messages when i subscribe with my i686 device.
if (this->subscribe(nullptr, topic.c_str(), qos))
to give some more information, I build mosquitto from source with a ARM toolchain on a i686 VM with the following command
make WITH_TLS=no WITH_DOCS=no WITH_BUNDLED_DEPS=no
Furthermore i checked the traffic through wireshark, en encountered something weird.
this is a working subscribe connection.
the problem is that my arm device doesn't send these subscribe request messages. while
this->subscribe(nullptr, topic.c_str(), qos)
does return no Error
Can I check, if you're building from source I presume you're on version 1.6.4? Is that correct?
that is correct
Yes, I wouldn't expect a subscrube request in wireshark as it's not shown int eh broker logs either. Are you sure you actually make the subscribe call? Add an else clause so you get a print regardless?
As far as i can see it does resolve the subscribe function succesfully. It return no Error
, and accourding to the documentation it should be sufficient to call the loop_start();
to ensure that it connects as it should.
the client \<unknown> error is given at connect(this->broker.c_str());
which can mean that the setup in this function is not going as should. But the function itself return also no Error
Does the on_log
logging show anything useful on the client?
on log: 16, Client mosq/lc7NxCxvBtlsB1bX8y sending CONNECT on log: 16, Client mosq/lc7NxCxvBtlsB1bX8y sending SUBSCRIBE (Mid: 1, Topic: diagnostics, QoS: 0, Options: 0x00) Waiting for samples... //is called after the initialize function in the main on log: 16, Client mosq/lc7NxCxvBtlsB1bX8y sending CONNECT on log: 16, Client mosq/lc7NxCxvBtlsB1bX8y received CONNACK (0)
It does send a subscribe according to the on_log. It doesn't show the second CONNECT/CONNACK on the I686 devices.
I have tried to build it on another build environment with another toolchain and also updated the linux version on de ARMv7 target. It still gives the same error.
updated the linux version
- do you mean something newer than Wheezy? I haven't yet reproduced this on any architecture, but don't have anything running Wheezy.
The example code you provide is incomplete, is it possible to have a full working example that shows the problem?
yes, i tried it this time to build on a yocto ubuntu 18.04. and with updated system libraries on my device. unfortunatly the problem still persist.
I use a custom build linux OS which is higly based on debian wheezy. it uses kernel 3.10 which gave some trouble, but i created a workaround for that. i use that workaround on both devices, and it works on both. (i mentioned my workaround here #1403)
I will provide a working copy later today
I debuged the library and encountered an problem. In my case on the ARMv7 chip the library runs into a race condition where it internaly returns MOSQ_ERR_NO_CONN
and tries to reconnect Problem is that this reconnect is for some reason not done correctly.
By accident i encounterd that the problem was resolved after more that 2 print statements between the initializer and the first real socket action in the loop_forever
function.
Therefor i put a usleep right at the start of loop_forever
and the problem disapeared.
//loop.c
int mosquitto_loop_forever(struct mosquitto *mosq, int timeout, int max_packets)
{
usleep(400);
int run = 1;
int rc;
unsigned int reconnects = 0;
unsigned long reconnect_delay;
...
It is an ugly solution but for now it helps. I don't know if i am the only one with this problem, and if kernel version, linux environment and/or hardware specs is responsable for this, But i finally got it working.
It may be good to check if the reconnect function does work when the first connection is not performed well.
EDIT
i forgot to mention that before my fix I had the Socket error on client <unknown>, disconnecting
notification also when i tried to connect with my i686 device. But for some reason it didn't had any impact there.
Good find! Are you able to check with the latest fixes
branch? There are some extra locks added where they were missing. It could be related.
I tried the fixes branch, but it doesn't resolve the problem.
on the broker side i still recieve the message Socket error on client <unknown>, disconnecting.
which is the indication of the race condition.
I haven't been able to reproduce this, but I think I can tell where the most likely cause for this is. I've just pushed a commit which may fix it.
This is a regression for me on both desktop linux (glibc, x86_64) and openwrt (musl-libc, mips32/ath79)
I use connect_async() followed by loop_start, and I simply never receive my on_connect callback. I'm using libevent2 for my own portion of the application, and if I send a signal that I'm handling via libevent2 (ctrl-c to cleanly exit) I finally see the connect callback firing before immediately my clean exit handler disconnecting and exiting.
test case for connect_async available at https://github.com/etactica/mosquitto/commit/f7e04bf963259d131f1ee57b991a6c6c1bce8162
There's an updated fix in the fixes
branch that helps the regression and should help this too.
Currently I'm working on a project where i need to use MQTT on an ARMv7 and i686 device. The current problem is that specificly on the ARMv7 device some problems arise.
i am running on both of the devices Debian 7 Wheezy.
When i try to connect the ARMv7 device to the broker it seems to connect but doesn't receive anything at all. The broker returns the following
The I686 devices show the following notice
after little research i noticed that the broker does not forward anything to the ARMv7 device. i don't know it this is a bug of something else. Both devices use the same codebase, but are built with seperate compilers.
Thanks in advance. Nick