Closed American1337 closed 1 year ago
Is there any commonality to which actions are getting dropped (ie. Always the last 3 in the script, or it's always the lights, etc.), or is it different ones every time?
It's not consistent.
Attempt #1: I turned all circuits on, and then used the script I posted above with the delays removed. Note that "Pool" circuit is not in my script so it is expected to stay in on state, all other circuits should be turned off
Attempt #2: I turned all circuits on, and then used the script I posted above with the delays removed. Note that "Pool" circuit is not in my script so it is expected to stay in on state, all other circuits should be turned off.
The results aren't the same but script was identical.
Attempt #3: New script I moved lights AFTER switches. Now some lights are staying on. Like before ignore the pool circuit on
Script for attempt #3 (lights last)
alias: "[Pool] Turn Off All Circuits (Medium)" sequence:
(Edited to clarify the delays removed in attempt #1 and #2 similar to the script used in attempt #3)
A few more experiments:
Attempt #4: A delay every 8 actions for 1 second. Within each group of 8 no delay
Result: still broken, some switches near the end of group don't change. Not consistent which are broken.
Attempt #5: A delay every 4 actions for 1 second. Within each group of 4 no delay
Result: all circuits updated correctly.
Script from attempt #4:
alias: "[Pool] Turn Off All Circuits (Burst)" sequence:
Script from attempt #5:
alias: "[Pool] Turn Off All Circuits (Burst)" sequence:
Interesting. Thanks for all those test cases.
Just to confirm, you verified that circuits were actually still left on in the pool equipment? Just trying to rule out this being just a delay in updating the states in HA.
For the experiments, I've tried two ways:
First method:
Second method:
I've also confirmed visually. The automation without delays only turns off some of the lights. Then automation with delays turns them all off.
Well that pretty certainly rules out this being simply a delay in updating the states. Thanks for being so thorough.
The (hopefully) last thing I'd be interested in is Home Assistant logs for a not-working script being run with Debug logging turned on for the Pentair ScreenLogic integration. That should show what is actually being sent to the screenlogic protocol adapter and what is and isn't coming back. I'm wondering if the protocol adapter just can't keep up with that many calls that quickly.
You should be able to turn on Debug logging in the integration options from the Integrations page.
Attaching logs.
I enabled debug logging then restarted HA, then used HA to turn on all the circuits, then ran the script.
After script was executed, I waited a bit, then captured log, diagnostic dump and screenshot of the dashboard.
Dashboard after script:
Attaching logs from a passing case. Turned out my 1 second delay every four switches wasn't enough to make it reliable, that failed a few times this morning. I added a 1 second delay per-switch and it passed a few times.
script-working-with-delays.txt passing-diagnostics-after-script-was-ran.txt *passing-home-assistant.log
Dashboard:
This is great data. Thank you.
No solutions yet, but a few things I am seeing:
31
, which other SL projects identify as a "bad parameter" response. If that code truly represents a bad parameter, then those circuits (and really all circuits) should never function. But as your previous testing shows, the circuits getting code 31
have worked previously. Wondering if 31
is more of a "I'm full, please wait" response.screenlogicpy
is not throwing an error on the unexpected returned code 31
.Like I said, no solution yet, but I have some things to investigate. Thanks again.
I don't think this is the final solution, but wondering if you could test this change in HA.
Currently the integration is using the debounce only to catch any late updates on the pool controller, and not actually debouncing refresh requests.
This change forces circuit changes (circuits and lights) to use the debounced refresh, and shortens the debounce cooldown to 0.5 seconds. This should reduce the requests to the protocol adapter dramatically when using your script, but hopefully still feel "snappy" enough when switching circuits on and off.
@American1337 Just curious, have you tried setting your script up like this:?
alias: "[Pool] Turn Off All Circuits (No Delay)"
sequence:
- service: switch.turn_off
data: {}
target:
entity_id:
- switch.pentair_20_17_74_jets
- switch.pentair_20_17_74_slide
- switch.pentair_20_17_74_spa
- switch.pentair_20_17_74_spa_waterfall
- switch.pentair_20_17_74_upr_pool
- switch.pentair_20_17_74_upr_wtrfall
- switch.pentair_20_17_74_water_feature
- switch.pentair_20_17_74_waterfall
- switch.pentair_20_17_74_upr_cleaner
- switch.pentair_20_17_74_cleaner
- delay:
hours: 0
minutes: 0
seconds: 1
milliseconds: 0
- service: light.turn_off
data: {}
target:
- light.pentair_20_17_74_color_wheel
- light.pentair_20_17_74_color_wheel_2
- light.pentair_20_17_74_fiber_optic
- light.pentair_20_17_74_pool_light
- light.pentair_20_17_74_pool_light_2
- light.pentair_20_17_74_pool_light_3
- light.pentair_20_17_74_pool_light_4
- light.pentair_20_17_74_pool_light_5
- light.pentair_20_17_74_u_pool_ligh
- light.pentair_20_17_74_spa_light
mode: single
The fix doesn't work. I'm 99% sure I merged in the change correctly.
This is the log with the change included. I also did a compact script. What you suggested is similar to what first tried and I only split it up to test with delays inserted.
FWIW, if I take out switch turn off and just do the last light action, the lights remaining all turn off.
If I put 1 second a delay between switch.turn_off and light.turn_off, it didn't work. 5 seconds didn't work either. But 10 seconds works.
alias: "[Pool] Turn Off All Circuits (Two Statement with Delay)" sequence:
FWIW, if I take out switch turn off and just do the last light action, the lights remaining all turn off.
Well that confirms that the response it's giving (31
) isn't being given to indicate a bad parameter in the request.
While I still intend to solve this for the script you created, as a work around you could try using the screenlogic.set_color_mode
service to shut all lights off with one command to the protocol adapter.
service: screenlogic.set_color_mode
data:
color_mode: all_off
target:
device_id: [your_pentair_device_id]
You can get the device_id by using the Services Developer Tool and building the service call through the GUI.
Nice to know about that service. tested it and screenlogic.set_color_mode works for lights all on/off.
Are you able to use that service call in your script for all the lights instead of the individual light.turn_off
calls?
Yes, I'm using color_mode for now.
I'm also keeping delays around the switch on/off actions, so it would be nice if this issue can be fixed but I'm unblocked for doing my integrations.
Update: I used screenlogic.set_color_mode again today and it doesn't seem to work reliably. I've gone back to using individual light switch actions with 2 second delays between them.
How is screenlogic.set_color_mode implemented? Is it 1 message to the pentair system or multiple under the hood?
It's a single command to the pool controller. The different color modes represent different buttons in the Lights section. In your case, all_off
should do the same as the All Off button in the Color Lights page in the app.
Got it, I tried it again, it's always the same two lights that don't work with all_on. Even when I try with the app. Must be an issue with my pool config. I will follow-up with the pool system installers.
For now I will use all_on + two individual light actions.
Just released v0.7.0 of screenlogicpy
which supports subscribing to push updates from the ScreenLogic system. This will reduce the request traffic and hopefully enough in your case to not overload the protocol adapter and have it return an unknown message code.
Changes to the Home Assistant integration to use this functionality are being finalized now. You can follow the progress here.
Awesome. Thanks for the updates. I will follow other thread.
(P.S.: Note that I'm having some issues with the pentair systems unrelated to HA -> issue after power loss (circuit names missing). I am calling back the folks that installed it. Likely some latency before I will be able to re-test).
I've installed the updates, I do see changes from the android pentair app reflected pretty much immediately in HA now, that's pretty cool.
However, the script to turn off all circuits without delays still fails. Attaching recent log.
Note that some of my sensors are missing now, but presumably it's because I just updated screenlogic and not the rest of HA. "Unable to prepare setup for platform screenlogic.binary_sensor: Platform not found (cannot import name 'EntityCategory' from 'homeassistant.const' (" shows in the logs.
Well the good news is that the amount of total requests during this sequence IS significantly reduced, as Home Assistant is not trying to refresh the state of the entire pool system with each circuit command.
The bad news is that we're still getting the same error code response (31
) from the system after a number of commands, in this case it seems to be failing on the 15th command made in the span of 0.139sec. It takes a little over 6sec from the time the circuit commands are sent for an updated status message to be received from the pool.
It's possible this is some sort of rate limiting control on the protocol adapter. Assuming your ScreenLogic is set up with a wireless link between the pool controller and the protocol adapter as mine is, it's also possible that there's a rate limit to the wireless communication between them.
As for the missing sensors, yes it looks like it would be caused by a mismatch between the screenlogic integration code and the core Home Assistant code. There was a change 2 days ago that affected both portions and they would both need to be updated to stay in sync.
Despite not knowing why the protocol adapter is rejecting the commands or currently how to resolve it, this situation does highlight some improvements that can be made in the api library. For one, currently the api is not validating the response code for a request, only if the data in the response is what's expected. For commands that expect an empty response, that leaves the door open for a lot of false positives that a command was successful. Truely, Home Assistant should be throwing an error when those light commands fail, but the api think's they went fine. Secondly, if a command were to be detected as failed, there's currently no way to retry the command.
I have some ideas of how to resolve both of those, but even then I don't know if it would resolve this issue.
Curious if there was any result from the installer?
History on the protocol adapter install:
Off topic, but I'm curious, is a battery backup for circuit names typical? Seems odd to me that they wouldn't put some simple non-volatile memory to persist names across power losses.
I've also realized the penair desktop app lets me change the circuit names myself, so in case of power lose in the future I will be able to restore the names.
Furthermore, I've fixed the all light offs command. The installer had only set the first 6 lights. If I add four lights to the second six, I'm able to get fix that other issue I had, now I can control all 10 light circuits with all_on/all_off.
Off topic, but I'm curious, is a battery backup for circuit names typical? Seems odd to me that they wouldn't put some simple non-volatile memory to persist names across power losses.
My EasyTouch board has a button cell battery mounted on it, similar to the CMOS battery on a motherboard. It's conceivable that the battery went dead and needed to be replaced. Happens on motherboards and the BIOS settings are lost.
I've also realized the penair desktop app lets me change the circuit names myself, so in case of power lose in the future I will be able to restore the names.
Hopefully you won't have to now that the battery was replaced.
Furthermore, I've fixed the all light offs command. The installer had only set the first 6 lights. If I add four lights to the second six, I'm able to get fix that other issue I had, now I can control all 10 light circuits with all_on/all_off.
Nice find. I had thought that as long as it was a light-type in circuit function the all_on/all_off would work, but looking in the ScreenLogic manual now, you're right. It clearly states that light circuits need to be added to the Special Lights section for lights to be affected by those commands.
FYI, the dev
branch contains updates to retry failed requests if you are interested in trying it out.
Nice, I will test it when I can. Unfortunately, my pool system is broken again. Not sure what happened but my circuit names got wiped out again, the installer is going to come review/fix.
Roger. That's a bummer. At your convenience.
I've merged in the latest from core-dev/homeassistant/components/screenlogic and screenlogicpy-master/screenlogicpy and I'm still seeing issues when I don't put delays in the automation.
Attaching the latest log: home-assistant_2023-03-07T07-01-24.286Z.log
A few asides:
Thanks for following up on this.
Can you confirm that you are running the latest code for screenlogicpy? I'm not seeing any log output that indicates it is retrying failed (or rejected) requests. I just confirmed that there should be some output along the lines of: (code and request will be different for you)
Bad Parameter for request code: 12572, request: b'\x00\x00\x00\x00'. Will retry 1 more time(s) in 2 seconds
The dev branch has this, but master is now the absolute latest. v0.8.0 of the package on pypi should also work for testing, but does have a bug in it with keeping the connection alive via ping. I don't think you'll run into that though as you're requesting pump status at regular intervals.
I'm not sure what happened earlier, I was pretty sure it had updated to the new version but perhaps not.
Reattaching log that does show "Will retry 1 more time(s) in 2 seconds"
home-assistant_2023-03-07T20-15-05.851Z.log
One of the pool lights were stuck on after running the off script without delays. Ignore pool circuit, the script doesn't turn it off.
Thanks. I do see the retry message in the logs now, and it appears that each retried request is succeeding. I'm not seeing any obvious failures of the retry mechanics. Could you send new diagnostic info now that the hardware has been updated? Would like to eliminate circuit numbers that are listed as succeeding from an updated list of your circuits.
Thanks. After looking at the data I have some questions:
I'm not 100% sure as I redid the experiment 4-5 times, but I think the sequence for the log I posted was:
1.) first attempt -- all pass 2.) delay 3.) second attempt -- 1 light failed
Ok. It doesn't look like the fail->retry->success instances shown are of any bering to that last light remaining on. In fact, the final status update screenlogicpy received shows that the light remaining on wasn't any of the ones that went through the retry sequence.
What is really interesting is the second to last line in the log, which was also a status update received by screenlogicpy less than half a second before the final one. It shows the "problem" light as off. Yet less than half a second later it is reported as on, with no action from screenlogicpy. And both of these update messages arrive almost 7 seconds after the final command is acknowledged by the protocol adapter. In this case, it doesn't seem caused by screenlogicpy.
Ruling out you accidentally turning that circuit back on after the test, or the light actually being off and the protocol adapter just not sending the update in a timely manner, the only things I can venture a guess at is possible poor RF communication between the protocol adapter and the pool controller, or somehow the pool controller itself failed in turning off the circuit and then eventually reported correctly as such.
Either way, I don't have any evidence that screenlogicpy is not doing what it is designed to do. In fact, your log helped to verify for me that the retry mechanism is working in a real-world scenario. For that, I'm thankful. I'm sorry there isn't a better answer for you.
As such though, I'm inclined to close this issue at this time. In the future if there is evidence, in the logs or otherwise, that screenlogicpy is doing something bad to contribute to this issue, we can always revisit.
I'm doubtful it's an RF link quality problem, the RF transmitter is about 15 ft from the receiver.
Regardless, I also expect the bug is on the pentair side and not in screenlogicpy. I'm okay closing this out as I'm unsure at this point what more you would reasonably do. If you don't have it already, you could consider a traffic stress test, i.e.: loop that sends high volume traffic to a pentair bridge and see if anything gets dropped. But even if you did that and found the link wasn't reliable I'm not sure what else would you do in screenlogicpy .
At this point, I have all the integrations I wanted working, and using all_off/all_on helped a lot. I put a few 500ms delays in the scripts to help avoid the dropped command issue but the delays are not noticeable.
Thanks for looking and getting all the fundamental issues resolved. It's has enabled some very cool integrations, for example, I have an engraved Lutron RA3 keypad for the pool features next to patio light scenes. It's much nicer than what we had before with the pentair remote outside or the phone app.
It's a reasonable thought about a stress test setup. I've been reluctant as I don't have very many aux type circuits that I'm comfortable with potentially turning off and on quickly, but maybe it's valid with repeated data requests.
I'm glad you have found a way to make it work. That scene setup is so clean. 😍 Thanks for sharing. Nicely done!
For HA automations, I'm able to get single light/switch automations working correctly. If I'm toggling larger # of entities I'm seeing actions get dropped.
Example below shows a script I want to run every night to ensure that everything is shutoff correctly. I also want to do a script to turn all the pool lights on/off.
The problem is if I don't put a delay, some of the actions get dropped. I find that 600ms has a good success rate. 0ms/50ms miss a few of the circuits.
This isn't a blocker for me, adding a delay is fine, just a bit awkward as I would have preferred a single light off action with a list of entities.
This is the yaml of the working automation. For broken case, delay can be changed to 0 or 50 ms.
alias: "[Pool] Turn Off All Circuits (Slow)" sequence: