Catfriend1 / syncthing-android

Syncthing-Fork - A Syncthing Wrapper for Android.
Mozilla Public License 2.0
1.21k stars 40 forks source link

Syncthing often gets stuck on "Starting" (VPN use case) #957

Open Mrnofish opened 1 year ago

Mrnofish commented 1 year ago

Description of the issue

Syncthing gets frequently stuck on Starting. The "Syncthing is starting" message can be seen on both the notification and inside the app under the Status tab.

This issue breaks the ability to run on a schedule, because the app needs manual intervention (Force Stop) for syncing to resume. The last sync indicator also breaks as the app appears to incorrectly assume that the Syncthing starting status is tantamount to synchronization having taken place.

Reproduction Steps

Enable "Run according to time schedule" so Syncthing starts and stops every hour.

Version Information

Device platform info

Will fill in later if deemed necessary.

Android Log

Will fill in later if deemed necessary.

jolz commented 1 year ago

same problem on Android 13, Google Pixel 6. syncthing should only run on WLAN, not on mobile connection -> it start/stop several times a day. I find it "often" with the "Syncthing is starting" message.

Syncthing-Fork v1.23.0.0 from F-Droid

Mrnofish commented 1 year ago

Yeah, like I said, force stopping manually is a major impairment of auto syncing, which is the main purpose of Syncthing.

You might find yourself expecting on files having been transferred from one device to another, only to find out Syncthing hasn't transferred anything because it's been stuck since the last time you had a chance to check it.

At which point, if the device is out of range, has bad/spotty connectivity or the source device has gone offline or has connectivity issues, you're - excuse the French - SOL.

Hence, the only reliable solution is not Force Stopping, but rather leaving Syncthing running 24/7, as this minimizes the chance of the issue arising, but also has a significant impact on battery life.

In light of this, I consider this issue as a fairly major one.

PS: having updated to 1.23 shortly after opening this ticket, I can also confirm updating did not bring any relevant change.

truxnell commented 1 year ago

Same thing here, been since the last few weeks. Pixel 6 Pro Android 13, Syncthing fork V1.23.0.

I am using rules to have it only run when I'm at home on home WIFI - it works OK, but its like when i leave and return it getting stuck re-starting after the pause based on rules.

As it hasn't started I cant access the logs in app to see anything, not sure how else to contribute to issue. Might try turning on file logging and see if I can capture pertinent info.

Mrnofish commented 1 year ago

As it hasnt started I cant access the logs in app to see anything, not sure how else to contribute to issue.

You might be able to collect some useful information from adb logcat, especially if you are lucky enough to spot the problem not too long after its arising.

Catfriend1 commented 1 year ago

I'm currently running Syncthing manually and then going "exit" after reaching the 100% sync as I only have small data deltas to catch up every day. If someone has an idea why this happens on newer Androids, I'm all ears.

dcasado commented 1 year ago

In my case I'm seeing this behavior on a OnePlus 6 Android 10, Syncthing fork 1.23.0.0. There is kind of a pattern where it gets stuck on starting if I turn on the VPN when I'm not on my home WiFi. So If I never turn the VPN on it works fine but if I turn it on sometimes it gets stuck and sometimes doesn't. If I remember correctly the first time I saw this behavior was on version 1.21.0.3.

Mrnofish commented 1 year ago

FWIW I'm also using a VPN.

dinkonin commented 1 year ago

Same issue on Oneplus Nord (Android 12) Syncthing-Fork v1.23.0.0 from F-Droid. I don't use a VPN, the app is configured to run only on WiFi. When the "Syncthing is starting" message is displayed I cannot access the log files the app just freezes.

emacsomancer commented 1 year ago

I have been having a similar problem, and it seems to be related to this upstream issue: https://github.com/syncthing/syncthing-android/issues/1799

kgorlen commented 1 year ago

Same issue on Pixel 4a, Android 13, Syncthing Fork Wrapper Version v1.23.0.0. Configured to run on WiFi only, and cannot access the log while stuck "Syncthing is starting".

nyanpasu64 commented 1 year ago

One odd behavior I've noticed is that even when the GUI thinks Syncthing is starting, Syncthing may actually be running in the background and uploading newly taken camera files to other devices. So I think it's the Java GUI failing to properly detect Syncthing binary state.

What tells the GUI that Syncthing has fully started (State.ACTIVE)? I tried finding all usages of that value, and unless I missed something, com.nutomic.syncthingandroid.service.SyncthingService#onApiAvailable calling onServiceStateChange(State.ACTIVE) is the only method setting State.ACTIVE as a value, rather than comparing a state to that value.

What calls that method? com.nutomic.syncthingandroid.service.SyncthingService#launchStartupTask runs mRestApi = new RestApi(this, mConfig.getWebGuiUrl(), mConfig.getApiKey(), ... this::onApiAvailable, () -> onServiceStateChange(mCurrentState)), which defers a call to onApiAvailable.

Considering the multiple paths to call onServiceStateChange, I do wonder if there's a race condition with multiple overlapping updates to mCurrentState. Debugging race conditions requires global knowledge of a system, which I don't have of syncthing-android right now (since I just started looking into this bug an hour or so ago). Hopefully I'll be able to figure this out over the next few days or weeks.

And worse yet, reproducing this bug is tricky. In my real-world usage, I think I would get this bug due to switching Wi-Fi networks (which breaks wireless debugging) or entering/exiting power saver mode (which requires unplugging the power cable, breaking wired debugging). I may try "Enable "Run according to time schedule" so Syncthing starts and stops every hour", switching Wi-Fi networks with USB debugging plugged in, or even starting/stopping by hand. Is there a good way to save all logcat logs of a specific app to a file on storage (possibly a new filename with timestamp every time the Android app starts, to avoid unbounded file size growth like syncthing.log), for later logging?

EDIT: I tried switching from one Wi-Fi network to another, and after seeing the "onServiceStateChange: from DISABLED to STARTING" message switching Wi-Fi networks again. It's been stuck on Starting for minutes now. I think I've found a way to reproduce the bug (possibly through human-timescale state management errors alone, without even a high-speed heisenbug race condition). I'll debug what's going on tomorrow.

nyanpasu64 commented 1 year ago

I've found one way that Syncthing-Android gets stuck on "Starting".

Quickly the logcat will start being spammed by errors:

2023-03-24 03:24:00.217 16679-16679 PollWebGuiAvailableTask com...b.catfriend1.syncthingandroid  W  Unexpected error while polling web gui
com.android.volley.NoConnectionError: java.net.SocketException: socket failed: ENONET (Machine is not on the network)
    at com.android.volley.toolbox.NetworkUtility.shouldRetryException(NetworkUtility.java:173)
    at com.android.volley.toolbox.BasicNetwork.performRequest(BasicNetwork.java:145)
    at com.android.volley.NetworkDispatcher.processRequest(NetworkDispatcher.java:132)
    at com.android.volley.NetworkDispatcher.processRequest(NetworkDispatcher.java:111)
    at com.android.volley.NetworkDispatcher.run(NetworkDispatcher.java:90)
Caused by: java.net.SocketException: socket failed: ENONET (Machine is not on the network)
    at java.net.Socket.createImpl(Socket.java:492)
    at java.net.Socket.getImpl(Socket.java:552)
    at java.net.Socket.setSoTimeout(Socket.java:1180)
    at com.android.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:143)
    at com.android.okhttp.internal.io.RealConnection.connect(RealConnection.java:116)
    at com.android.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:186)
    at com.android.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:128)
    at com.android.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:97)
    at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:289)
    at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:232)
    at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:465)
    at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:411)
    at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:542)
    at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:106)
    at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:30)
    at com.android.volley.toolbox.HurlStack.executeRequest(HurlStack.java:91)
    at com.android.volley.toolbox.BasicNetwork.performRequest(BasicNetwork.java:104)
    at com.android.volley.NetworkDispatcher.processRequest(NetworkDispatcher.java:132) 
    at com.android.volley.NetworkDispatcher.processRequest(NetworkDispatcher.java:111) 
    at com.android.volley.NetworkDispatcher.run(NetworkDispatcher.java:90) 

These errors proceed even while Syncthing itself is making sync progress.

com.android.volley.NetworkDispatcher.run implements java.lang.Thread#run. By checking in a debugger, it appears the same com.android.volley.NetworkDispatcher thread is looping continously:

Oddly enough, I can access https://127.0.0.1:8384 on an Android browser (which asks for a username and password), even while the network thread is throwing ENONET. Is Volley trying to connect to a stale disconnected Wi-Fi network (even though all Wi-Fi networks share a Linux network interface)? IDK, again I don't understand how Volley works. Hopefully @Catfriend1 or an upstream developer can fix the problem from here.

Can I safely upload the full logcat, or are device IDs sensitive and I should redact them first?

I have not tried reproducing this issue in upstream Syncthing. If the bug reproduces there as well, should it be fixed there or in this fork?

Catfriend1 commented 1 year ago

@nyanpasu64 thanks, and wow. Good find :). I'll take a look, hopefully next week. Did you try to turn off "bind to active network" in settings?

nyanpasu64 commented 1 year ago

After unchecking "Behaviour > bind to active network" and force quitting Syncthing, I was not able to reproduce the endless ENONET errors. I was however able to get Syncthing into a state where it would onServiceStateChange: from ACTIVE to DISABLED, then not start up again, even though it was connected to a Wi-Fi network:

Screenshot_20230324-035340_Syncthing-Fork

Additionally I also managed to enter an endless cycle of W Deferring shutdown until State.STARTING was left, no matter how many times I switched networks. This also shows up as "Syncthing is starting." even with "Bind to active network" disabled!

Do you need logs for these bugs? Honestly at this point, with the number of different bugs I've seen, the only way I know to prevent more from popping up after stacking attempted fixes, is to informally verify the correctness of the state management algorithms. You'd have to prove that no matter the starting state of the Java GUI and Go services, the Syncthing daemon will be appropriately stopped or started bound to the correct network, and the Java GUI will correctly reflect the state of the daemon and can communicate with it if running, in bounded time without hitting deadlocks or nondeterministic failure. Then you'd have to rewrite/rearchitect any pieces of the code which we can't understand, cannot prove correct, or proved to cause incorrect behavior under some circumstances.

Mrnofish commented 1 year ago

One odd behavior I've noticed is that even when the GUI thinks Syncthing is starting, Syncthing may actually be running in the background and uploading newly taken camera files to other devices. So I think it's the Java GUI failing to properly detect Syncthing binary state.

In which case what you're looking at might be unrelated to the problem I (and possibly others) have been having.

As pointed out previously, when it gets stuck on "Syncthing is starting", files get out of sync -- which is why I noticed in the first place, as I don't usually pay much attention to what the service notification is saying.

Currently, I have set Syncthing to run only when a charger is connected. Since I do partial cycles, this means Syncthing gets started once or twice per day.

Fewer starts mean less of a chance for it to get stuck.

It's less then ideal but borderline tolerable.

nyanpasu64 commented 1 year ago

In which case what you're looking at might be unrelated to the problem I (and possibly others) have been having.

In a previous comment I said:

Additionally I also managed to enter an endless cycle of W Deferring shutdown until State.STARTING was left, no matter how many times I switched networks. This also shows up as "Syncthing is starting." even with "Bind to active network" disabled!

Quoting from my logcat dump:

2023-03-24 03:57:10.740 24613-2933  SyncthingNativeCode     com...b.catfriend1.syncthingandroid  I  [H72QH] INFO: Failed initial scan of sendreceive folder "Documents (Shared)" (ehh9n-p2drl)
2023-03-24 03:57:10.741 24613-2933  SyncthingNativeCode     com...b.catfriend1.syncthingandroid  I  [H72QH] INFO: TCP listener ([::]:22000) shutting down
2023-03-24 03:57:10.749 24613-2933  SyncthingNativeCode     com...b.catfriend1.syncthingandroid  I  [H72QH] INFO: Failed initial scan of sendreceive folder "Music" (svd2z-syafu)
2023-03-24 03:57:10.751 24613-2933  SyncthingNativeCode     com...b.catfriend1.syncthingandroid  I  [H72QH] INFO: Exiting
2023-03-24 03:57:10.781 24613-2884  SyncthingRunnable       com...b.catfriend1.syncthingandroid  I  Syncthing was shut down normally via API or SIGKILL. Exit code = 0
2023-03-24 03:57:10.809 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  I  Web GUI has come online at https://127.0.0.1:8384
2023-03-24 03:57:11.152 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  I  shouldRun decision changed to false according to configured run conditions.
2023-03-24 03:57:11.153 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  W  Deferring shutdown until State.STARTING was left
2023-03-24 03:57:11.652 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  W  Deferring shutdown until State.STARTING was left
2023-03-24 03:57:12.153 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  W  Deferring shutdown until State.STARTING was left
2023-03-24 03:57:12.747 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  I  shouldRun decision changed to true according to configured run conditions.
2023-03-24 03:57:12.893 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  W  Deferring shutdown until State.STARTING was left
2023-03-24 03:57:13.155 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  W  Deferring shutdown until State.STARTING was left
2023-03-24 03:57:13.894 24613-24613 SyncthingService        com...b.catfriend1.syncthingandroid  W  Deferring shutdown until State.STARTING was left

To me, it looks like I have been able to get Syncthing to not sync while the GUI is stuck on starting, but only after disabling "Bind to active network". It may be possible to reach this state without "Bind to active network" as well (possibly by switching Wi-Fi networks more times after triggering the endless ENONET errors), but I have not attempted (and unfortunately I'm working on other projects now).

I think there are a lot of intertwined bugs, all caused by one or more (I don't know how many) state machine errors in the Syncthing-Android service management code. IMO it's valid to debug them all in this issue.

Mrnofish commented 1 year ago

IMO it's valid to debug them all in this issue.

I would agree as long as one solution fixes them all in one fell swoop. Things might get messy if this doesn't turn out to be the case, though ?

Anyways, I think the only difference under Run Conditions between fork and upstream is the addition of the Run on a Schedule option.

If that's the case, the Wi-Fi switching trick might mess up upstream, too, OTOH I could not find an equivalent bug report over there.

Mrnofish commented 1 year ago

BTW, for those who have Developer Options unlocked, it should be possible to figure out whether the Syncthing executable is running even though the wrapper claims it isn't, without having to resort to external debug facilities.

Under Running Services (the description might be different in different Android versions), locate Syncthing-Fork and tap the item.

If the only thing currently running is the wrapper, you should notice the memory consumption to be low (mid 50s for me).

When the Syncthing executable is also active, memory consumption will increase. The figure for me is mid 80s but the it may realistically vary depending on how many folders or devices you have and what tasks Syncthing is currently executing at the moment.

I have switched the Schedule option back on and keeping an eye on what Syncthing is doing, will report if my findings deviate from the situation described in this post.

guss77 commented 1 year ago

I also have this issue, where the Syncthing service starts - either automatically or if I use "force start" (and I can see it connects and syncs on my other devices) but the UI shows "Syncthing is starting" and never gets the the point that it sees that the service has started.

If I use "force stop" (or just let it timeout), the service itself will not stop (I can still see it connected on my other devices, and also I can access the local web UI from a browser), the UI and notification still say "Syncthing is starting", and the log has the message SyncthingService: Deferring shutdown until State.STARTING was left repeating every second. At this point it will continue running and eating battery until forced stop.

I have tried with both "Bind to active network" and without.

I think that the problem may be - at least for me - that the wrapper can't figure out the address for the REST API and starts RestApi with an invalid address, then nothing detects that there is a problem and we keep waiting for an onApiAvailable() that will never be called. Here are the startup logs:

04-17 10:37:40.489   488   488 D SyncthingService: onStartCommand
04-17 10:37:57.915   488   488 I SyncthingService: shouldRun decision changed to true according to configured run conditions.
04-17 10:37:57.968   488   488 I SyncthingService: onServiceStateChange: from DISABLED to STARTING
04-17 10:37:57.971   488   488 I SyncthingService: Web GUI will be available at https://:8384
04-17 10:37:58.482   488   740 I SyncthingNativeCode: [start] INFO: syncthing v1.23.2 "Fermium Flea" (go1.20.1 android-arm64) CF\Anwender@NET2017 2023-02-27 03:45:16 UTC [noupgrade, stguiassets, stnoupgrade]
...
04-17 10:38:00.121   488   740 I SyncthingNativeCode: [MRI6F] INFO: QUIC listener ([::]:22000) starting
04-17 10:38:00.124   488   740 I SyncthingNativeCode: [MRI6F] INFO: GUI and API listening on [::]:8384
04-17 10:38:00.124   488   740 I SyncthingNativeCode: [MRI6F] INFO: Access the GUI via the following URL: https://127.0.0.1:8384/

Device: Pixel 4a running Android 13 build 230305

guss77 commented 1 year ago

I think that the problem may be - at least for me - that the wrapper can't figure out the address for the REST API

I tried to debug this, but after installing a debug package - which cleared the app storage and I needed to set things up again - I can no longer reproduce the problem: the wrapper correctly sets up the REST API URL and can connect to it. I should have used the export configuration option, but I kind of forgot it exists...

jherazob commented 1 year ago

This happens to me constantly, several times a day, every time there's a network change (like when i leave the area of WiFi or enter it back), but ONLY on my phone (original Pocophone), the tablet (a Samsung tablet) under the same circumstances doesn't get the issue. Not sure what's the difference.

Mrnofish commented 1 year ago

@jherazob I've got decent luck lately keeping the Bind Active Network option disabled.

If you haven't done that already, you should go through each and every option and make sure they are exactly the same on either device. Make sure to leave no stone unturned.

jherazob commented 1 year ago

@jherazob I've got decent luck lately keeping the Bind Active Network option disabled.

If you haven't done that already, you should go through each and every option and make sure they are exactly the same on either device. Make sure to leave no stone unturned.

Checked and the phone and the tablet have the exact same settings, so it's (probably) not one of the options. Also tried it, the Bind Active Network option made no difference, whether it's enabled or not it hangs "Starting" so probably it isn't that.

nyanpasu64 commented 1 year ago

I haven't had the motivation to dig further into this bug. I wonder if instead of relying on log prints, we could semi-automatically log and visualize traces of events, to get more observability into how the various states should operate under normal operation, and what happens when the race condition hangs Wi-Fi handling. I've been trying out the Tracy profiler to track control flow and input latency on Dolphin Emulator, but since Syncthing-Android is a Java application, you'd need to use a different library (eg. https://www.google.com/search?client=firefox-b-1-d&q=java+tracing+library).

You'd have to prove that in any starting state of the Java GUI and Go services, after one or more network transitions occur, the Syncthing daemon will be appropriately stopped or started bound to the correct network, and the Java GUI will correctly reflect the state of the daemon and (if the daemon is running) can communicate with the daemon. This must occur in bounded time, without randomly deadlocking or ending up in a wrong state.

Mrnofish commented 1 year ago

@jherazob Probably not going to like the sound of this very much, however I have seen indication that sometimes dismantling a folder, or even the whole misbehaving client, and setting it up again from scratch can fix obscure issues with Syncthing.

If you decide to go down that road, it may be worth to start from as much of a blank slate as possible.

PS: Don't shoot the messenger!

nyanpasu64 commented 1 year ago

I've found one race condition:

If you enter the starting state (by connecting to Wi-Fi), then disconnect from Wi-Fi while still in the starting state, onShouldRunDecisionChanged calls shutdown which fails because the server is still starting. If you then connect to Wi-Fi again, onShouldRunDecisionChanged does nothing because mCurrentState == STARTING. Once the server finishes starting, a deferred call to shutdown (scheduled before connecting to Wi-Fi) kills the server (after connecting to Wi-Fi), and we do not start the server again (because we connected to Wi-Fi). As a result, Syncthing/server remains offline forever, until we switch networks again.

One way to fix this bug is to abort deferred calls to shutdown() if we have connected to Wi-Fi and mLastDeterminedShouldRun is set to true (and no longer want to shut down the server). This appears to fix the particular sequence of events I tested. I do not know if this is sufficient to fix all bugs. I've pushed my code to https://github.com/nyanpasu64/syncthing-android/tree/fix-wifi-change.

I'm not sure if com.nutomic.syncthingandroid.service.SyncthingService#applyCustomRunConditions behaves properly if Syncthing is in the middle of startup. I will have to add new logging to this codepath to figure out what it's doing. I'm not sure if this change misbehaves when "Bind to active network" is active (because it uses an old Syncthing process bound to the wrong network, and doesn't shutdown and restart it as needed).

EDIT:

For a more "correct by design" and comprehensive fix, it may be possible to schedule future calls to updateShouldRunDecision (not just onShouldRunDecisionChanged() or shutdown()) whenever we fail to apply the full configuration. Then we would have to change applyCustomRunConditions and onShouldRunDecisionChanged to reapply configuration if it differs from the most recent applied/acknowledged state change, not just the previous attempted state change (if it could not be applied since Syncthing was in the middle of starting). This would be complex to get right.

codethief commented 1 year ago

To add another data point: I've been struggling with the same issue for a while. I set up Syncthing to only run on wifi. Now, it always gets stuck on "Starting", though, and then uses up ridiculous amounts of battery to the point where my phone gets incredibly hot. At that point, changing the run conditions (e.g. "force stop / ignore run conditions") doesn't do anything anymore and I need to reboot my phone for the settings to take effect.

I will try to look into the logcat output next time this happens.

Catfriend1 commented 1 year ago

Happens to me too, seems to be caused by the VPN app. Before that, I never had those hang problems. With VPN running or started while the app already is running, I get this error behaviour.

DocSniper commented 11 months ago

Just to let you know, I have this startup problem too but without VPN. (on Android 13 / LOS 20 with syncthing-fork 1.23.6.0 from F-Droid)

kgorlen commented 4 months ago

Also occurs on Samsung Galaxy Tab A8 SM-X200, syncthing-fork version 1.27.6.0. No VPN.