eclipse-paho / paho.mqtt.embedded-c

Paho MQTT C client library for embedded systems. Paho is an Eclipse IoT project (https://iot.eclipse.org/)
https://eclipse.org/paho
Other
1.37k stars 758 forks source link

HardFault after use of requestedQoSs[i] #169

Closed RUWinters closed 5 years ago

RUWinters commented 5 years ago

For a project I am making I use mbed in combination with the paho MQTT library. For the publish and connect no problems occur but when I try to subscribe to a topic the MCU will go in to the HardFault handler after writeChar(&ptr,requestedQoSs[i]) in the function MQTTSerialize_subscribe. I have tried to initialize a separate variable withunsigned char opts = requestedQoSs[i], but than the hardfault will occur at this piont in the code.

In my understanding is the variable given to int requestedQoSs[] originally an enum. Can someone explain to me how this is possible (tried searching it but get only examples of standard use) and how to fix this problem. Commenting out the requestedQoSs[] and not using the writeChar function will (probably) give problems with the SUBACK response.

The OS I use is mbed with the W7500P from Wiznet.

Gr,

R Winters

scaprile commented 5 years ago

A hardfault is a hard fault, an exception, an indication of a "hard" problem. I don't know your mcu, sometimes (ARM Cortex-M for example) it is an escalation of another exception that has not been enabled. In some cases (ARM Cortex-M for example) is also an indicator of a bad memory access, either trying to read code on an odd address or reaching a disabled peripheral. The mcus I mentioned also have a mechanism for the developer to pinpoint the source and cause of the exception. In general, the easiest way for an application developer to find the issue is to put a breakpoint in the entry to the last function before the hardfault and then execute line by line until it is triggered. The academic way is to inspect the registers associated to the fault, preferably in a purposedly written hardfault handler. If my memory serves me well, you will find someone with a similar report in one of these libraries a couple years ago (maybe more ?). A quick search for "W7500P" reveals that it carries an ARM Cortex-M0 CPU, so that is what is happening to you. In particular, M0 is a v6-M architecture processor and can not access 16/32/64-bit magnitudes on unaligned addresses, so my best guess is that (like the other guy iirc), you are feeding the wrong address, for example, you are passing a pointer to a char (which happens to be on an odd address) to a function expecting a 16-bit or a 32-bit integer (which must be on an aligned (divisible by 2 or by 4 respectively) address). OR, you are trying to store an integer on an unsigned char which luckily for you is on an odd address; otherwise it will just overwrite and your app will explode seconds later (or releases later) somewhere unrelated. I'm lazy enough to check the declaration of requestedQoSs[], but, in the special case of your modification, you are answering yourself here, just put your two pieces together:

int requestedQoSs[]
unsigned char opts = requestedQoSs[i]

If the linker decides to place 'opts' on an odd address, you get a hardfault; if it places it on an even address, whatever follows 'opts' will be overwritten and it may or may not wreak havoc depending on what is placed there and how you access it, etc. One of the world's famous recipes for "but it was working fine and now it does not work with the next release" and things like that. (You should have received a compiler warning about storing an int on a char... though... ignoring warnings is another of those recipes)

RUWinters commented 5 years ago

Part of the problem is solved now, thnx. found the issue you spoke about

94