Improve Handling for unavailable targets

deathbybandaid commented 2 years ago

I run autoscan with systemd with the following targets: 2 plex, 1 jellyfin, 1 emby.

More often than I'd like, one of these will be down, sometimes resulting in a script exit.

If missing upon startup:

FTL Failed initialising target error="libraries: Get \"http://REDACTED:8097/Library/VirtualFolders\": dial tcp REDACTED:8097: connect: connection refused: target unavailable" target=jellyfin target_url=http://REDACTED:8097

If missing after it's recieved a task to send to targets:

ERR Not all targets are available, retrying in 15 seconds... error="availability: Get \"http://REDACTED:8097/System/Info\": dial tcp REDACTED:8097: connect: no route to host: target unavailable"

What I'd suggest is handling the error with:

Target {target_info} is unavailable, will try to initialize again in 3 minutes.

And

Target {target_info} is unavailable, will try to send task again in 3 minutes. (attempt 1 of 3)

This way, tasks can get sent to the Available targets, and not get hung up on a single downed target.

UnknownWitcher commented 2 years ago

Note: I have no experience with go, but I do have experience with other languages, so while I am confident in my ability to find and provide a solution I am not comfortable committing anything due to my lack of experience with the go language, however this issue hasn't been resolved and it feels like a simple solution.

https://github.com/Cloudbox/autoscan/blob/5c66ab857f70e6d6d37166f12546969ef55e278f/cmd/autoscan/main.go#L339

targetCounter := 0
for {

https://github.com/Cloudbox/autoscan/blob/5c66ab857f70e6d6d37166f12546969ef55e278f/cmd/autoscan/main.go#L357-L358

if targetCounter == 4 {
    // sleep indefinitely
    select{}
} else {
    counter += 1
    time.Sleep(30 * time.Second)
    continue
}

I would much prefer this (and the others) to be at least a minute maybe 3 as suggested by OP https://github.com/Cloudbox/autoscan/blob/5c66ab857f70e6d6d37166f12546969ef55e278f/cmd/autoscan/main.go#L364-L365

m-rots commented 2 years ago

The main reason why Autoscan halts any scanning when a target is down is because we want to prevent two things:

Targets not unnecessarily scanning the same path multiple times.
Targets not being "left out" because they're unavailable at the time.

@UnknownWitcher the indefinite sleep is not necessarily related to this issue. Autoscan will only reach that state when the target is throwing a fatal error, which can only occur if there are setup-related problems (and thus only an adjustment to the config can make a difference).

Autoscan's current behaviour is to check Target health every 15 seconds. By specifically checking for health (instead of simply sending other scans), no duplicate scans occur. If all targets are found to be healthy, scanning will continue!

Now, I hear you wonder:

Can't Autoscan simply keep track of which scans have been successfully processed by certain targets? So if a target is unavailable, only that target will receive scans at a later date?

We did think about this while designing Autoscan, but by linking scans to targets, the database becomes dependent on the config. We try to avoid this as much as possible as it can cause issues when someone's config changes.

UnknownWitcher commented 2 years ago

Hey @m-rots thanks for getting back to me, that's completely understandable and yes I did have that thought lol, anyway, I ended up running the container with "depends_on - plex" to avoid this issue.

Cloudbox / autoscan

Improve Handling for unavailable targets #142