Huanyang/VFDSpindle communications robustness

First of all, thanks for a really great project - I'm really enjoying working with it so far!

I'm just starting to set up my Huanyang VFD and had some potential concerns with the robustness of the current VFDSpindle implementation.

For a bit of background context, I originally used a cheap RS485 module with what may be a knockoff MAX485 IC on it, which could transmit to the spindle fine, but had terrible noise in receive mode, completely corrupting the data to the point that no responses were successfully received.

What was odd, and a bit jarring, was that the VFDSpindle implementation would still happily send speed and spindle start/stop commands despite having never received any valid response from the spindle. This initially confused me because it appeared like everything was working correctly with the spindle control until I started implementing a get_current_rpm parser and realized that I was not actually getting any data from the VFD at all...

The thing that really threw me off was that the behavior gets even weirder if the receiver is intermittent rather than 100% corrupt. If I connected a shared ground between my MAX485 and VFD, the receiver improved enough to occasionally get a valid response, but noise would still often corrupt the data -- in this configuration, I would often be able to M3 start the spindle, but then some noise would corrupt a response message and trigger a "Critical Spindle RS485 Unresponsive" alarm. The alarm didn't seem to trigger a spindle stop, and it also prevented me from M5 turning off the spindle without clearing the alarm first, which does not seem great from a safety perspective. In other words, I actually had better control over the spindle when the ESP32 was not receiving any responses from the VFD than when it received intermittent responses.

I've started a rewrite of the VFDSpindle/Huanyang classes to restructure them and incorporate some more robust round-trip communications checks (no code ready to share quite yet), but wanted to discuss a few high level concepts behind my planned changes first to make sure I'm not going down a bad path for any reason:

the spindle's status (configured- and actual- rpm and state) will be repeatedly read, and an alarm raised in any of these conditions:
- if we don't receive a response (only if we've previously gotten a response, to avoid raising an alarm immediately if the ESP32 is powered up before the spindle)
- if configured state does not match the state we commanded -- this is to prevent motion from continuing to run if the spindle has stopped running from some reason (e.g. if someone pressed the STOP button on the VFD itself, or the VFD turned off the motor due to some fault)
- if configured rpm does not match the rpm we commanded (not sure when or why this would happen, but best to be safe)
- actual rpm is outside a fixed margin of the commanded rpm (except during start/stop) -- I probably won't get to this right away, but seems useful to check
the spindle can't be turned on if the most recent status-read failed -- this is to prevent turning on a spindle that we cannot hear back from, which is risky since we won't know what state it's in
in the event of any spindle error/alarm mentioned above, we will continuously send commands to stop the spindle until one is acknowledged -- this is to prevent a "runaway spindle" so to speak; this way a transient communication error can't cause a "stop spindle" command to be dropped and not re-sent when the connection comes back

Thoughts on those above principles? The main thing I'm not totally sure about is the behavior around errors/alarms since I'm still somewhat new to grbl semantics - is it normal/acceptable for an error to cause the spindle to turn off automatically? It seems safer that way to me, but maybe there are other issues like needing to turn it back on if you clear the error and resume?

I'd also love a slack invite to discuss further, if that's alright - my email is my github username at gmail.com - thanks!

On a related note, I have been digging through the various Huanyang documentation sources to better understand parts of the protocol that are less commonly used, and collecting those notes here: https://paper.dropbox.com/doc/Huanyang-VFD-modbus-protocol--BArwgZ4Tgj_dWv27riC1eFBTAQ-836X8EruAxH21PFe0Z2Um

It's primarily based on this Huanyang Manual but there are some oddities I'm still planning to validate against my VFD to see if it's a documentation error or an actual odd behavior.

Interesting things I've noticed from experimenting so far:

you can send an empty control data (function 0x03) message in order to receive the control status back, which provides the current running state and direction, so this can be used as part of the regular polling of status
I've determined the meaning of some of the poorly-described control status bits, e.g. the difference between the "run" and "running" bits ("run" indicates the VFD is set to run, and is un-set immediately if you send a stop; "running" indicates whether the VFD is actively driving motor, so it stays set longer as the VFD decelerates after you send a stop)

In general in my updated implementation I'm trying to keep a clear distinction between values that we've requested, which I'm referring to as "configured" values, and values representing the current state (regardless of what we've requested), which I'm referring to as "actual" values.

So for example, my basic status check currently returns 4 values:

configured_rpm
actual_rpm
configured_state (Cw, Ccw, Disable)
actual_state (Cw, Ccw, Disable)

It does this by reading the "Set F" status register (for configured rpm), reading the "RoTT" status register (for actual rpm), and writing an empty control command (for configured and actual state).

You can see how those change during a simple (forward) spin-up, spin-down test:

[MSG:RS485 status: rpm=12000, arpm=0, st=0, ast=0]
[MSG:RS485 status: rpm=12000, arpm=0, st=0, ast=0]
[MSG:RS485 status: rpm=12000, arpm=0, st=0, ast=0]
[MSG:RS485 status: rpm=12000, arpm=0, st=0, ast=0]
[MSG:RS485 status: rpm=12000, arpm=318, st=1, ast=1]         <-- VFD starts running - actual rpm starts increasing, and configured state and actual state are both Cw
[MSG:RS485 status: rpm=12000, arpm=1134, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=1950, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=2766, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=3582, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=4398, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=5214, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=6030, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=6846, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=7662, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=8478, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=9294, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=10110, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=10926, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=11742, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=12000, st=1, ast=1]     <-- VFD is at full speed
[MSG:RS485 status: rpm=12000, arpm=12000, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=12000, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=12000, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=12000, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=12000, st=1, ast=1]
[MSG:RS485 status: rpm=12000, arpm=12000, st=0, ast=1]    <-- VFD is commanded to stop; actual rpm is still 12000 and actual state is still Cw, but configured state has changed to Disable
[MSG:RS485 status: rpm=12000, arpm=9643, st=0, ast=1]
[MSG:RS485 status: rpm=12000, arpm=6978, st=0, ast=1]
[MSG:RS485 status: rpm=12000, arpm=4272, st=0, ast=1]
[MSG:RS485 status: rpm=12000, arpm=1552, st=0, ast=1]
[MSG:RS485 status: rpm=12000, arpm=0, st=0, ast=0]    <-- VFD has stopped; actual rpm is 0 and actual state is now Disable 
[MSG:RS485 status: rpm=12000, arpm=0, st=0, ast=0]
[MSG:RS485 status: rpm=12000, arpm=0, st=0, ast=0]

@scottbez1 Good that you have found a way to get at the actual RPM, this can be used for spindle at speed functionality. Having this will IMO address some of you concerns about robustness.

Making good progress on this so far - I have a bunch of basic test cases passing, including some that would not pass on the existing implementation (mostly around error handling).

In progress diff against main (1468ff8)

A high level overview of the big architectural changes:

The biggest change is to the VFDSpindle abstraction. The VFD task is no longer a relatively simplistic command sender that pops commands from a queue and sends them with a few retries for good measure, but now actually encodes the expected bidirectional communication pattern with a generic external VFD:

It continually polls the status of the VFD to make sure communications are healthy and that the VFD is in the desired configuration (rpm+state)
- see read_status
- the subclass implementation of read_status can send as many commands as necessary to get this information, via the new send_command protected helper that VFDSpindle provides (which handles addressing, CRC generation, and response CRC/length validation under the hood)
If the VFD is not in the desired configuration, it will repeatedly send commands necessary to get it to that state (indefinitely)
- see request_configuration
- again, the subclass implementation of request_configuration can send as many (or as few) commands as necessary

The function comments in VFDSpindle.h describe this updated abstraction in more detail.

A key part of implementing this abstraction is that the modbus "command" queue has been replaced with a "config" queue.

changes to the requested "config" (rpm/state) are pushed to the queue directly (as a struct of SpindleState and uint32_t rpm), rather than being first converted to a modbus command
the "config" queue actually has size 1, and config messages are always pushed to the queue using xQueueOverwrite - meaning the latest requested config replaces any previous requests
- if the VFD task were to fall behind due to slow comms or some other reason, we don't actually want to queue and "replay" any intermediate states. It's always best to jump to the current state as soon as possible, so newer requests simply replace older ones
- the queue is really just a threadsafe way to pass the requested rpm/state to the VFD task from the grbl main task; everything necessary for actually executing on that request (deciding whether any communication is necessary, constructing modbus commands, etc) happens in the VFD task and is mediated by the base VFDSpindle, with details (like the actual message bytes) still delegated to the subclass to implement. This means the VFD task can indefinitely retry if needed and will never "drop" anything

So far I've only implemented HuanyangSpindle, but I will look at H2A soon (it's simply commented out in the attached diff to avoid compilation issues since it hasn't been updated yet). It's possible that this new abstraction isn't quite right or generic enough to cleanly support H2A in addition to Huanyang, in which case I'll continue to adapt my approach. I may set this rewrite aside for a little bit now though; I need to finish building my Workbee first so I can actually run full test programs rather than just manually sending simple spindle g-code commands...

I have just scanned your comments and have to look into this in more detail to be honest.

We fully agree that we should err on the side of caution when working with VFD's. In fact, that's the main reason why I firmly believe that RS485 is the correct way to work with VFD's; most other ways to work with them are just one-way communication.

Please work on the Devt branch and not the master branch. That makes a PR and/or merge much easier in the long run, and perhaps some of what you describe is already fixed there.

The reason why there is a queue that's separate from the actual spindle code, is because of the design of grbl. GRBL expects a single, non-blocking thread that basically takes care of motion control. Motion control blocks here also take care of spindle control, and spindle speed is also part of that. Unfortunately, this thread should not be blocked for long time periods of time. In other words, calls to Spindle are expected to be return 'quickly', whatever that may mean.

One of the commands however is the 'mc_dwell' . I have been meaning to change that to accept a function pointer, which can then be used to do a wait operation -say- when something is wrong, or when the spindle is not up to speed. This has everything to do with the safety things you describe, and has been on my TODO list to investigate for quite a while now.

One of the key goals of the VFDSpindle class when I designed it was to make it as much independent as possible of the VFD that's being used. There are many, many, VFD protocols around, and I browsed through a few manuals of Delta, Huanyang (they have different VFD's), my own H2A and some others before I got started. Bottom line, they don't support the same commands at all. Some work with frequencies, others with rpm. Some provide actual rpm feedback (H2A), others just tell you what register values are set (Huanyang apparently). In any event, there is usually some register value you can pull to see if the thing is alive, and normally you get a modbus 'ok' reply when a command is received properly. So, that brings me to the principles you mentioned:

the spindle's status (configured- and actual- rpm and state) will be repeatedly read, and an alarm raised in any of these conditions:

if we don't receive a response (only if we've previously gotten a response, to avoid raising an alarm immediately if the ESP32 is powered up before the spindle)

I believe this is already the case. The 'unresponsive' flag is supposed to take care of this.

if configured state does not match the state we commanded -- this is to prevent motion from continuing to run if the spindle has stopped running from some reason (e.g. if someone pressed the STOP button on the VFD itself, or the VFD turned off the motor due to some fault)

That's a good point. Haven't thought of that, and it's therefore not implemented. Yes I do think this is a good idea if the spindle supports either register value or actual rpm. Most spindles should. As for implementation, see the next point.

if configured rpm does not match the rpm we commanded (not sure when or why this would happen, but best to be safe) actual rpm is outside a fixed margin of the commanded rpm (except during start/stop) -- I probably won't get to this right away, but seems useful to check

The code in VFDSpindle.cpp (line 78-102) is designed to pull different diagnostics on a regular interval when no commands are being pushed into the queue. This should ensure everything is working as planned. You can simply test the rpm value in the parser, and make it fail if things are not ok.

the spindle can't be turned on if the most recent status-read failed -- this is to prevent turning on a spindle that we cannot hear back from, which is risky since we won't know what state it's in

Yes. pollidx == 0 is the special ID that enables initialization of the VFD spindle. For some spindles like the H2A, you need to know some register values before you get started. You can't start a spindle, as long as initialization did not take place.

in the event of any spindle error/alarm mentioned above, we will continuously send commands to stop the spindle until one is acknowledged -- this is to prevent a "runaway spindle" so to speak; this way a transient communication error can't cause a "stop spindle" command to be dropped and not re-sent when the connection comes back

I'm not sure if that's a good idea: if you have a spindle speed change that err, it will trigger a spindle stop, which will do damage. That being said, it might be a good idea to mark certain spindle commands as 'required', which means the MAX_ATTEMPTS check will basically never be triggered.

Some of these can be solved if we could somehow make dwells if something is wrong. That would basically just stop the machine, while things are wrong.

I'll drop you some messages on slack, let's see what works best.

bdring / Grbl_Esp32

Huanyang/VFDSpindle communications robustness #689