OpenPHDGuiding / phd2

PHD2 Guiding
https://openphdguiding.org
BSD 3-Clause "New" or "Revised" License
250 stars 114 forks source link

Proposed fix for ZWO cameras disconnecting #936

Open spiriapbergeron opened 3 years ago

spiriapbergeron commented 3 years ago

Hi. My ASI 120mm camera keeps disconnecting. I also use a ASI 071 camera for imaging.

Many folks complain of similar issues. I found just 2 recent threads complaining about it. They blame the ZWO drivers but I'm not sure.

https://www.cloudynights.com/topic/766152-guiding-failures-and-camera-disconnects/?hl=%2Bdisconnect https://www.cloudynights.com/topic/776466-help-phd2-quickly-loses-guide-camera-connection/?hl=%2Bdisconnect#entry11177079

Since I'm a C++ developer, I thought I'd have a look at the PHD2 code. I hay have found the source of the bug, and I propose a solution. Interestingly, I have worked with camera capture libraries before, so this is bit more than a simple guess.

Unfortunately, I can't build phd2 or debug it at the moment.

Links to the relevant .h and .cpp files below

https://github.com/OpenPHDGuiding/phd2/blob/master/cameras/ASICamera2.h https://github.com/OpenPHDGuiding/phd2/blob/6df2e2b617f6568f5e2ce0fbd20461cc4afa25f4/cam_zwo.cpp

The interesting bits are in the cpp file, in Camera_ZWO::Capture(), between lines 817 - 892.

Here's what the code does, in a nutshell:

(1) The code starts to take a pic, by calling ASIStartExposure()

A status code is returned by the function, but it's not checked. If it failed at starting the exposure for some reason, the rest will fail and the camera will go in disconnect mode. There is no error trapping or recovery at a failed ASIStartExposure, when there should be.

Also, we don't check to see if the camera is ready to start capturing (more on that below)

Also, if ASIStartCapture failes, it might be because there's a bunch of ASI API calls that performs camera parameter setups just before ASIStartCapture, maybe one of them has not completed yet? (I know there's plenty of USB bandwidth, but we're talking microseconds here).

(2) The code enter a polling loop to get the camera status (wait for exposure to complete), by calling ASIGetExpStatus()

A status code is returned. Either: (a) the camera has finished taking the pic, returns SUCCESS. (b) the camera is still working, presumably still gathering data, so it waits a few millisecs and calls ASIGetExpStatus() again. (c) other core, means we exit the polling loop and perform a retry (without any delay, which would be necessary to add)

(3) If 3 retries have failed (eg: condition (c) above) then the camera is assumed to have been disconnected.


So this might obviously lead to the bug we are experiencing.

Recommended fixes:

Fix #1: call ASIGetExpStatus() in a loop, with delays and watchdog, until it returns ASI_EXP_IDLE. When the camera is idle, it's now OK to call ASIStartCapture.

FIX #2: Check the return code of ASIStartCapture() and handle it gracefully. Do not enter the 3-retry loop if we know the start capture has already failed.

FIX #3: In the waiting-for-image-to-complete loop, when ASIGetExpStatus() isn't SUCCESS or WORKING, you can perform the retry, but wait for a few millisecs. Otherwise, you will plow through the 3 retries in about 1 nanosecond, not giving any chance to the camera to have cleared its status busy/bad/whatever status.

agalasso commented 3 years ago

@spiriapbergeron do you want to send us a PR (or, better: 3 pr's for the 3 fixes)?

d33psky commented 3 years ago

Nice digging in :) Would #1 lock up PHD2 when that ASI_EXP_IDLE is never reached ? Let's add a timeout there. I fully agree with #2 and #3.

EricClaeys commented 2 years ago

@spiriapbergeron, when using video mode, it calls several ASI routines and doesn't return if there's a failure, and probably should. For example, line 733 calls ASIGetControlValue() and outputs a debugging message on success but continues on failure. This is true with all the ASI function calls through line 790.

maphilli14 commented 2 years ago

Is this the root of the linked bug, allsky#650 above? If so how can I test on a Pi3 or 4 with latest buster image?

TIA!

Mike

EricClaeys commented 2 years ago

@maphilli14, no, AllSky checks the return codes and if there's an error it prints a message and either quits or continues if possible. PHD2 has some cases where it's not checking the return codes, or is, but isn't quitting if there's a problem.