JaneliaSciComp / G4_Display_Tools

Collection of tools for the G4 LED arena displays.
Creative Commons Attribution 4.0 International
2 stars 2 forks source link

Fast succession of 'all on' - 'all off' breaks system #57

Open floesche opened 2 years ago

floesche commented 2 years ago

Sending the 'all on' (0x01 0xFF) and 'all off' (0x01 0x00) several times in fast succession leads the G4 system in an undefined state and slows down the execution of commands in the range of seconds (factor ~5000).

In a script I sent 15 times an 'all on' and then an 'all off' as fast as possible. The error happens each time I run the script, usually between the 4th and 10th iteration.

At some point (in the example below at the 8th "all off"), the response gets delayed significantly, in this example by around 15 seconds, typically between 6 and 18 seconds (median around 9.2s). Once this happened, all following 'all on' commands get delayed by an amount in the same range while 'all off' are executed immediately.

This is apparently an error on the Main Host side: If I send the commands without waiting for a response my script has long finished while the commands are still being executed (with the delay). This suggests, the commands are stuck somewhere on the Main Host input queue.

image

The output inside the main host window contains the following text:

03/08/2022 15:46:38.319 :  Root Directory Path - C:\Program Files (x86)\HHMI G4\Support files
03/08/2022 15:46:38.321 :  PC Name - reiser-ww10.hhmi.org, IP Address - 10.102.40.39, TCP Port - 62222
03/08/2022 15:46:38.398 :  TCP Connection Established
03/08/2022 15:46:38.404 :  All-On received
03/08/2022 15:46:38.416 :  All-Off received
03/08/2022 15:46:38.664 :  All-On received
03/08/2022 15:46:38.668 :  All-Off received
03/08/2022 15:46:38.681 :  All-On received
03/08/2022 15:46:38.690 :  All-Off received
03/08/2022 15:46:38.703 :  All-On received
03/08/2022 15:46:38.704 :  All-Off received
03/08/2022 15:46:38.718 :  All-On received
03/08/2022 15:46:38.720 :  All-Off received
03/08/2022 15:46:38.734 :  All-On received
03/08/2022 15:46:38.735 :  All-Off received
03/08/2022 15:46:38.737 :  All-On received
03/08/2022 15:46:38.737 :  All-Off received
03/08/2022 15:46:53.991 :  All-On received
03/08/2022 15:46:53.991 :  All-Off received
03/08/2022 15:47:03.219 :  All-On received
03/08/2022 15:47:03.219 :  All-Off received
03/08/2022 15:47:45.610 :  All-On received
03/08/2022 15:47:45.610 :  All-Off received
03/08/2022 15:47:54.841 :  All-On received
03/08/2022 15:47:54.841 :  All-Off received
03/08/2022 15:47:58.034 :  All-On received
03/08/2022 15:47:58.034 :  All-Off received
03/08/2022 15:48:01.235 :  All-On received
03/08/2022 15:48:01.235 :  All-Off received
03/08/2022 15:48:19.498 :  All-On received
03/08/2022 15:48:19.498 :  All-Off received
03/08/2022 15:48:31.741 :  All-On received
03/08/2022 15:48:31.741 :  All-Off received
floesche commented 2 years ago

There is no problem when just sending 100s of 'all on' or 100s of 'all off' commands in fast succession.

floesche commented 2 years ago

Adding a delay between sending an 'all on' and an 'all off' command decreases the chance of creating this problem. For example, adding an additional pause of 1ms triggers the problem only after around 15..20 iterations, increasing the pause to 1.5ms means the problem only appears reliably after 25..30 iterations and so on. The more I increase the delay, the less likely the system breaks, but I have seen it break for 2ms delay after 2, 7, 85, or iterations, for a 3ms delay after 89, 447, 856, for 4ms delay after 3147 or 7831 iterations.

The delays are introduced on the script side. According to the output of the Main Host, the actual commands are sent further apart. If the time log there is to be trusted, then most commands are received with more than 10ms delay, but whenever the script executes faster for whatever reason and the delay is less than 10ms, the Main Host goes into the unrecoverable state.

Possibly this is related to #48 and maybe more detailed logging, as suggested in #53 can help identifying the issue.

floesche commented 2 years ago

(response via email 2022-04-01T09:24):

I think Issue #57 looks to be related to the issue #21 with the All-off command essentially being the same thing as a stop-display. I am modifying it to actually send a blank frame instead of a stop display command.

floesche commented 2 years ago

Changing the all-off command to just turn off the LEDs would be good – my reading of your description in the TCP Commands.xlsx was, that the all-on, all-off, and fullscreen grayscale (0x02 0x05) would be faster versions of sending an actual frame in streaming mode because the communication overhead is smaller. Most likely these three commands would even be faster than setting a pattern by ID, because technically DMA access is not necessary. Going forward and after your change to the all off command, will that be a correct assumption?

floesche commented 2 years ago

(response via email 2022-04-01T11:46):

Regarding the all-off command, the current software actually turns it off, but what I am suggesting is continuously streaming a blank frame instead so the transition between changing patterns is more seamless. The current all-off command is sending a stop-display so that why we are seeing similar issues to the quick start and stop display test you were doing in #21. There is less tcp overhead, but the DMA is still being used to steam the data. I am also removing the fullscreen grayscale command because that was used with the old arena where we could control different grayscale levels. Now we are only do 2 and 16.

floesche commented 2 years ago

Sending a blank frame for all off sounds good, that is how I read the description in your document. I don't need to understand why a DMA read is necessary at that point since that would basically be reading a large array with known values (0xFF, 0x00 for on / off, 0xii for the intensity ii)… But even then the three commands should be as quick as sending a pattern ID.

I thought the Fullscreen Grayscale was a useful command - we could quickly test the different brightness levels or provide a specific illumination during the experiment. If the command is working I wouldn't remove it. I understand that this is only useful in the 16 grayscale mode, not the 2 grayscale mode.