bitfocus / companion-module-ptzoptics-visca

MIT License
9 stars 13 forks source link

Allow ACK/Completion pairs to interleave #42

Closed jswalden closed 5 months ago

jswalden commented 5 months ago

This should fix #41. It's working reasonably in my local testing, so it is possibly something that can be landed in short order. It's regrettably more complex than the previous system in its guts (although I did use async generators to usefully clean up certain parts of the code), probably unavoidably so given the protocol details are in fact more complicated.

This was done against tip. To actually spin up the 3.0.0 release (or release candidate), it would have to be rebased onto that branch, which very notably precedes the tip-only rewrite into TypeScript. But it would be near-impossible for me to exaggerate just how much TypeScript helped me when writing all this code. And while I could rebase it backwards that way...honestly, it's probably safer not to endure all that extra effort and instead just make the 3.0.0 release include a handful more things that have happened since the branchpoint. Other than the TypeScript rewrite, they're all pretty small and noninvasive.

I'm still noodling on this, but I think the odds are good that I'll throw away the previous branch and make this (or this with whatever bugfixes might be needed) the branch instead, and do a release off of this.

jswalden commented 5 months ago

I believe this PR is fully implemented and ready for end-user testing. I've made a downloadable version of the module using the yarn build; yarn companion-module-build steps described on the wiki. (I had to rename it to .tar.gz to get it to upload to Github.)

pkg.tar.gz

@Gartom, @ploveman, since you've expressed interest in this module -- could I get you to download this and test it? You'll need to test it with at least Companion 3.2.

  1. Create an empty folder somewhere and give it a name like companion-dev-modules.
  2. In that folder, extract the pkg.tar.gz -- such that companion-dev-modules/pkg/companion/manifest.json will exist.
  3. In Companion's launch window, click the little gear icon in top right, and set the "Developer modules path" at bottom to the companion-modules-dev folder.

This should make Companion load this PR's version of this module, and when Companion starts up you should see a line like

240.04.14 15:45:21 Instance/Modules: ptzoptics-visca@3.0.0-rc2: PTZoptics: VISCA (Overridden)

in the logs -- the "Overridden" and "3.0.0-rc2" will prove the correct code is being run.

The problems this PR addresses made this code even more complicated than it was, so testing is even more desirable than it was with just the previous changes. Any and all help you can provide is highly appreciated!

ploveman commented 5 months ago

@jswalden I'll try and run some tests with it later this week

ploveman commented 5 months ago

@jswalden something I accidentally found, and I'm not sure if it's different from before or not is that if I send a request while the destination is down, it doesn't recover without stopping the instance. I thought based on the code that it would log something like can't send, but it logged the follwoing: 2024-04-18T23:14:53.497Z info Instance/Wrapper/visca Sending [81 01 04 3F 02 FD FF]... 2024-04-18T23:15:00.526Z warn Instance/Wrapper/visca Error executing action: Call timed out 2024-04-18T23:15:01.230Z warn Instance/Wrapper/visca Error executing action: Call timed out 2024-04-18T23:15:01.430Z warn Instance/Wrapper/visca Error executing action: Call timed out 2024-04-18T23:15:01.733Z warn Instance/Wrapper/visca Error executing action: Call timed out 2024-04-18T23:15:01.893Z warn Instance/Wrapper/visca Error executing action: Call timed out 2024-04-18T23:15:02.325Z warn Instance/Wrapper/visca Error executing action: Call timed out 2024-04-18T23:15:02.487Z warn Instance/Wrapper/visca Error executing action: Call timed out

I'll do some more testing with just the normal cases

jswalden commented 5 months ago

@ploveman The existing code's handling of the socket going away is not great -- turns out (once I used enough TypeScript to realize the error) we are ignoring whether socket writes succeed or not (because the socket isn't open). I'm in the middle of new work on this that I think should clean up that handling so that commands sent when the socket is closed will fail gracefully. (If memory serves, the package I uploaded here doesn't go back and resolve the commands/inquiries it sends if the connection drops, while my current work does.)

Regrettably, those changes are ranging far enough now that a fresh round of testing on it will probably be necessary. On the other hand, those changes were in part driven by a push on standing up some automated testing of command/response interactions -- and if that bears enough fruit (it's borne some, but I'm still ironing out issues in it), we'll be able to perform any such manual test in automated fashion, eliminating the need to ever check that specific interaction again manually.

ploveman commented 5 months ago

@jswalden I ran the rc in my production (church) environment for tomorrow's service. I didn't see anything unusual going on. I generally just use the recall preset. I put it through some other paces and all seemed to work properly. I also did a quick test with multiple presets on the same button with a few ms between. As your issue noted, it seems very unclear if the y nibble in the ack corresponds to anything in particular. I dumped the log messages and I only got one complete, looks like the y nibble in the ACK was re-used, hard to tell if the complete correlates to the last ACK or something else. I've attached the log CSV in case that helps. DESKTOP-8KJIPD6_companion_log_20240420-1331.csv

jswalden commented 5 months ago

Woo, many thanks @ploveman -- I'll give the log a look later.

Sadly the work on automated testing I'm doing has revealed a...limitation...in the TCPHelper class used to represent the module socket to connect to the camera. It's not an issue in normal use, but it somewhat blocks automated testing.

I'll probably update the PR soon enough with all the additional modifications I've done. But actually landing it will have to wait til a TCPHelper patch is written, which I may not understand Node net well enough to do myself. (We can just copy the modified TCPHelper into the repository rather than force a Companion 3.3 dependency through a bleeding-edge companion-module-base requirement-bump.)

ploveman commented 5 months ago

Woo, many thanks @ploveman -- I'll give the log a look later.

Sadly the work on automated testing I'm doing has revealed a...limitation...in the TCPHelper class used to represent the module socket to connect to the camera. It's not an issue in normal use, but it somewhat blocks automated testing.

I'll probably update the PR soon enough with all the additional modifications I've done. But actually landing it will have to wait til a TCPHelper patch is written, which I may not understand Node net well enough to do myself. (We can just copy the modified TCPHelper into the repository rather than force a Companion 3.3 dependency through a bleeding-edge companion-module-base requirement-bump.)

@jswalden Can't say I follow much on js/ts, but I played with using the isConnected from TCPHelper seems to give an accurate state, I don't know if that helps with what you were chasing or not.

jswalden commented 5 months ago

I played with using the isConnected from TCPHelper seems to give an accurate state, I don't know if that helps with what you were chasing or not.

Doesn't help -- nothing about TCPHelper in terms of using it directly is off, only off as far as its tearing down goes. This particular "leaking" behavior is pretty much outside of the usage contract in a garbage-collected environment!

jswalden commented 5 months ago

Just updated the PR with significantly-beefed-up changes implementing this -- and a bunch of tests. I want to say the meaningful-code delta between the previous version and this one is fairly small -- awaiting async actions in a few places to ensure everything is ready in proper order -- but I've been on this long enough to have forgotten precisely what I changed.

I want to go through the rest of the processing algorithm writing automated tests exercising each failure point before declaring this good to go, so still a bit more work to do here before this can land.

jswalden commented 5 months ago

I got through the rest of the algorithm writing tests for each nook and cranny of it. The code all now seems to do what I intended it to do, at least. The big question is whether real cameras differ from how I expect them to work.

This is a revised/updated packaged build for testing purposes:

pkg.tar.gz

I may try to set up some run-tests-on-PR Github workflow to help with testing this, especially if any further tweaks need to be made. Besides that, I'll try poking some to see if I can get the TCPHelper issue moved forward some.

jswalden commented 5 months ago

I figured out the problem with TCPHelper, and it's an easy and understandable fix that I'm 100% comfortable making in a forked copy. I'll add a forked TCPHelper to the PR and post a potentially-final version of this PR's work as a downloadable package for final testing.

jswalden commented 5 months ago

Okay @ploveman, this is IMO essentially release-candidate-worthy. It does still have a ton of debug logging of message processing enabled in it that I'll turn off for a final release, but otherwise this is 100% ready to release...at least as far as I know.

pkg.tar.gz

(And, importantly, it's ready to release as far as 800 lines of testing go -- in 44 different ways, start up a "camera" server, send it commands and inquiries, expect to receive specific bytes, have the "camera" reply with specific byte sequences, verify that they're handled as expected, methodically hitting every edge case in the return-message processing code within those tests.)

ploveman commented 5 months ago

Okay @ploveman, this is IMO essentially release-candidate-worthy. It does still have a ton of debug logging of message processing enabled in it that I'll turn off for a final release, but otherwise this is 100% ready to release...at least as far as I know.

pkg.tar.gz

(And, importantly, it's ready to release as far as 800 lines of testing go -- in 44 different ways, start up a "camera" server, send it commands and inquiries, expect to receive specific bytes, have the "camera" reply with specific byte sequences, verify that they're handled as expected, methodically hitting every edge case in the return-message processing code within those tests.)

@jswalden ack, will be the weekend before I can test with it

jswalden commented 5 months ago

Couple things I've realized/run across:

"Received Completion for socket that is empty"

I've done some more testing on actual hardware, and I seem to be able to trigger the "Received Completion for socket 2 that is empty (VISCA bytes [90 52 FF])" error case with current code.

Not in a guaranteed fashion that I understand well at a low level -- that seemingly would require syncing up command sends with ACKs and Completions at a fractional-second level with real hardware.

But if I add FOCUS NEAR and FOCUS FAR presets to my production setup, then I run my fingers across relevant preset-calling buttons for a few seconds and then bash back and forth on the FOCUS NEAR and FOCUS FAR buttons a handful of times, I can consistently trigger it. For example, in this complete log from very first command-send to connection failure. (With certain tweaks made to the logging, so I can more-leisurely pull out the SEND/RECV of actual bytes from the logs.)

I'm going to look at this more to see if there's a real issue here, that I can figure out how to fix.

It's also possible it's a camera bug, seeing as this is pretty racy behavior. If so, there might be nothing to do but restart the connection, with the camera in whatever state it happened to be in when the error was detected (with no attempt to perform the commands that weren't yet run or anything).

Command Not Executable

Regarding the socket number embedded in the Command Not Executable return message:

The current code presumes that this response can appear after an ACK, that assigns the corresponding command to a socket.

If the socket is in fact occupied (an ACK having been received but not processed), then the current code considers that pending ACK'd command as being the subject of the Command Not Executable.

But I've given this more thought, and I think handling both cases is unsafe. If the socket is occupied by a command because of an ACK, and that command isn't a command that's not executable...then it will remove the wrong command from the wrong queue. And when the actual Completion comes through, it will find that the queue it's looking in is empty. And as the code is currently written, it puts the entire module into a Connection Failure state, because it's no longer clear what commands/inquiries correspond to what return messages.

In manual "send these bytes and see what happens" mode, my camera always responds with Command Not Executable with no prior ACK. In light of the unsafety of trying to handle both cases, I'm going to change this to only handle that setup.

jswalden commented 5 months ago

Good news, "Received Completion for socket that is empty" looks like a bona fide bug, both from simply reading the VISCA back-and-forth and from converting it to an automated test that does in fact fail. I don't know what the reason is yet, but at this point it's only a matter of time.

jswalden commented 5 months ago

Heh, funny. Those "couple things" are in fact the exact same issue. Will get the PR cleaned up tomorrow to resolve this problem, mostly out of time right now.

jswalden commented 5 months ago

An updated pkg.tar.gz for testing is here:

pkg.tar.gz

jswalden commented 5 months ago

One final pkg.tar.gz, that bumps the @companion-module/base dependency so we're not forking TCPHelper any more:

pkg.tar.gz

This updates it from the 1.6.x line to 1.7.x, which per the latest README should not require a Companion version bump -- so this should still work with Companion 3.2.

jswalden commented 5 months ago

I spun up a v3.0.0-rc2 prerelease for this.

ploveman commented 5 months ago

One final pkg.tar.gz, that bumps the @companion-module/base dependency so we're not forking TCPHelper any more:

pkg.tar.gz

This updates it from the 1.6.x line to 1.7.x, which per the latest README should not require a Companion version bump -- so this should still work with Companion 3.2.

I had some issues initially loading the tgz, but I think it may have been a companion hiccup. Otherwise seemed ok from my sanity test.

jswalden commented 5 months ago

I ran this in production today without issue, so between that, the tests, and the volunteered testing here I declare this good enough to land. Next step, I'll solicit feedback on it more broadly (probably in the FB group) as a release candidate ready to go.