element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
GNU Affero General Public License v3.0
11.25k stars 2.01k forks source link

"Cannot reach homeserver" after homeserver restart #15241

Open Ezwen opened 4 years ago

Ezwen commented 4 years ago

(Disclaimer: I am a Synapse admin and I am here describing a problem encountered by many of my users.)

Description

When the used homeserver restarts (eg. when upgrading), element-desktop says that it cannot reach the homeserver (as expected), but keeps saying that even after the homeserver finished restarting (not expected).

Then, if the element-desktop user completely disconnects from the homeserver, and tries to log in once again, she now repeatedly get the error "cannot reach homeserver". Restarting element-destkop or the computer does not solve the problem.

If another client is used on the same computer (eg. element-web or Nheko), it works perfectly. In fact, all users of the homeserver who were not using Element did not even notice a problem.

Workaround: deleting .config/Element solves the problem.

Steps to reproduce

The error "cannot reach homeserver" appears.

Logs being sent: no (I have to ask)

Version information

cmasdf commented 4 years ago

We get the same error with Ubuntu clients for our custom homeserver all the time, even without restarting the server before. The provided workaround does not work for us. Ubuntu desktop clients simply can't connect. The server does not show any activity when these clients try to connect. I could not find any debug/log information at the client, if anyone can tell me where to get the logs, i can provide some. All other clients (web, macos, ios, android) are working as expected.

//edit:

server version: 1.19.3 ubuntu client: 1.7.7 (precompiled from https://packages.riot.im/debian/ default main)

jryans commented 4 years ago

Hmm, very interesting, I haven't heard of this happening before. Logs are available in the console with in the app, which can be accessed via Ctrl-Shift-I.

Please submit debug logs if possible (perhaps @cmasdf can do so), as that will greatly aid debugging the issue.

cmasdf commented 4 years ago

Thanks, for the hint with the dev tools, this pointed me into the right direction. My problem seems to be vector-im/element-desktop#798 - i just didn't notice it because i expected adding the root ca to the OS store to be sufficient. Looks like i cannot be much of a help with this problem. @Ezwen please provide further logs if possible. Thanks @jryans for your quick response.

Ezwen commented 4 years ago

Hmm, very interesting, I haven't heard of this happening before. Logs are available in the console with in the app, which can be accessed via Ctrl-Shift-I.

I've asked my users to do that next time they encounter the problem (probably with the v1.20.0 synapse release), let's hope they remember to do it :)

Thank you both for responding that quickly!

jryans commented 4 years ago

@Ezwen If possible, please ask users to submit debug logs by going to Settings -> Help -> Submit debug logs and link this issue. That will send a copy of the logs to our private logs server for easier analysis.

Ezwen commented 4 years ago

One of my users (using Element1.7.7) got me some logs! :tada:

  1. Our homeserver was restarted yesterday (27/09/2020) after updating synapse to 1.20.1.
  2. After this restart, this user opened Element, and Element got stuck forever trying to re-sync. Logs when this happens.
  3. After waiting a while and seeing that the resync will never work, this user logged out, and tried to login again ; the message "cannot reach homeserver" popped up. Logs when this happens.
  4. Finally the user closed Element, then deleted the configuration folder of Element (.config/Element), then tried again to login, which worked. Logs when this happens.

Lots of CORS errors in there, which is strange since AFAIK synapse is always adding correct CORS headers to responses?

t3chguy commented 4 years ago

Logs are not helpful to dig into CORS errors as they are basically an umbrella error, the Network tab would have more information.

Ezwen commented 4 years ago

Logs are not helpful to dig into CORS errors as they are basically an umbrella error, the Network tab would have more information.

Oh, good to know. I'll let my users now, and try to gather better data at the next synapse upgrade.

Ezwen commented 4 years ago

Additional information: I have more and more testimony showing that the problem actually does not happen after each synapse restart… but after each synapse upgrade from one version to another. This is getting even more mysterious.

t3chguy commented 4 years ago

Definitely sounds like a Synapse issue, probably it doing internal database migrations during upgrades.

Ezwen commented 4 years ago

Even if that is the case, isn't it worrying that when this happens Element-desktop requires a wipe of .config/Element before being able to connect to the homeserver once more?

Ezwen commented 4 years ago

Update: another of my users encountered a very similar problem, and managed to capture a network log.

The story:

Workaround: The user only managed to reconnect after clearing the Brave browser cache.

Logs: If this can help, the user provided network logs captured when the bug occured using Brave's developpment tools : app.element.io.zip

t3chguy commented 4 years ago

@Ezwen inform the user to evict that device immediately, the network logs contain their access token.

They also contain 0 failed Matrix requests.

dbkr commented 4 years ago

https://github.com/vector-im/element-web/issues/15509 could possibly be causing this?

t3chguy commented 4 years ago

Doubtful given that the thing which caused vector-im/element-web#15509 (regression) was after this report

Ezwen commented 4 years ago

@Ezwen inform the user to evict that device immediately, the network logs contain their access token.

I forgot to answer, but I did see your warning and instruct the user to evict his device ASAP. Thank you!

They also contain 0 failed Matrix requests.

The mystery thus remains… unfortunately I have new users coming that start experiencing this problem, and my investigations still don't give me any clue.

I will try to get logs from a user that uses element desktop, just in case Electron is more talkative than Brave.

vector-im/element-web#15509 could possibly be causing this?

Unfortunately this bug was fixed and AFAIK my users that have the latest Element version still experience this problem.

Ezwen commented 4 years ago

As I mentioned in my last message, I had access to the computer of someone who encountered said problem with our homeserver. Here is what I could observe:

And here is what I gathered:

Ezwen commented 3 years ago

Quick update: I'm still getting new users encountering this issue, with a synapse homesever recently upgraded to 1.29.0. The last user was using Element desktop 1.7.22.

Ezwen commented 3 years ago

Another quick update: issue still present, Synapse 1.36.0, Element 1.7.30. (I hope it does not sound too pushy − it's really not the intent, I only want to document)

Ezwen commented 3 years ago

Problem still present as of today—encountered by some of my users after the homeserver restarterd to upgrade to synapse 1.45.1.

Since no one else is joining this discussion, I suppose this must be a problem specific to my homeserver somehow. Yet, client-wise, the problem only happens with Element-web, and not with other clients such as Element-android, Fractal, Nheko, etc. Therefore I cannot help but think that somehow there must be a small problem in Element-web.

I've never done any Element-web dev. If I were to investigate this problem (eg. using a debugger), any suggestion on how/where to start?

tcbutler320 commented 2 years ago

I encountered this error when I was initially standing up my element/matrix server, the issue [I think] was that I failed to install an SSL cert for the element subdomain. I initially installed using old riot instructions, so the initial cert was for riot.domain.com, after troubleshooting I made a new cert for element.domain.com and was able to get passed this. hope this helps!

Ezwen commented 2 years ago

@tcbutler320 Thanks for the suggestion! Unfortunately, no self-hosted Element in my case, everything I described also happen with element-desktop :/

Ezwen commented 2 years ago

Update: still happening as of Element web/desktop 1.10.8, and synapse 1.55.2. I might try in the near future to run Element in debug mode using a cache folder provided by a user.

xuhdev commented 2 years ago

Some info from me: Unchecking "Query OCSP responder servers to confirm the current validity of certificates" in Firefox settings can work around the issue.

Ezwen commented 2 years ago

Some info from me: Unchecking "Query OCSP responder servers to confirm the current validity of certificates" in Firefox settings can work around the issue.

Interesting. The user I have that helps me the most with this issue is using Brave though, which I believe has no option to disable OCSP. I'll see whether I can try this somehow though.

Ezwen commented 2 years ago

A new interesting piece of information: when element-web reaches the described "bad state" , it only shows the error "Cannot reach homeserver" for a single homeserver, namely the one that I administrate.

In other words, if, in the described situation, a user enters a different valid homeserver URL, then no error is shown.

This means there is in fact clearly something different with our particular homeserver, but only when element-web reaches the described situation, with a (seemingly) faulty cache.

FrancescoSaverioZuppichini commented 2 years ago

same

blaine07 commented 2 years ago

Having this issue using in chrome browser element. Issue is 1- I don't know how to delete file referenced here and 2-I have no idea how to fix it. Issue evidently still persists though.

je-s commented 1 year ago

I can't speak for every case encountered in this thread, but I think I've at least found a simple workaround for my case; Adding the Port in the Homeserver-Field lets me connect instantly (example.com:8448).

I'm having that problem with Element only, despite other clients and https://federationtester.matrix.org/ are working perfectly fine and without any issues. I could only reproduce the problem by logging out and then logging back in again. After the first installation of Element without any configs present it's working without adding the port.