go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.14k stars 5.42k forks source link

[Bug] User dashboard doesn't have forward-slashes unlike other pages, resulting in incorrectly generated "pagination" navigation links for subpath-enabled instances #29533

Closed OdinVex closed 6 months ago

OdinVex commented 6 months ago

Description

After logging in (at the user dashboard), users are presented with their commit history and such...but the navigation at the bottom does not use correct links (subpath issue). Everywhere else they appear to be correct, but here is the actual HTML (cleaned up for sake of posting):

<div class="center page buttons">
  <div class="ui borderless pagination menu">
    <a class="disabled item navigation">
      <span class="navigation_label">First</span>
    </a>
    <a class="disabled item navigation">
      <span class="navigation_label">Previous</span>
    </a>
    <a class="active item gt-content-center">1</a>
    <a class="item gt-content-center" href="/SUBPATH?page=2&amp;date=">2</a>
    <a class=" item navigation" href="/SUBPATH?page=2&amp;date=">
      <span class="navigation_label">Next</span>
    </a>
    <a class=" item navigation" href="/SUBPATH?page=2&amp;date=">
      <span class="navigation_label">Last</span>
    </a>
  </div>
</div>

There is a missing forward-slash after SUBPATH. All other links generated by Gitea appear to be correct (/SUBPATH/).

I believe the issue is in paginate.tmpl, the way links are generated using {{$.Link}} instead of {{$.Link}}/ or perhaps {{AppSubUrl}}/? I don't do Go. All of that paginating uses broken linkage.

I do use Forgejo, but the issue is present in Gitea as well.

Gitea Version

1.21.3

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

Docker behind a reverse-proxy with a subpath.

Database

MySQL/MariaDB

OdinVex commented 6 months ago

Without the forward-slash users are told to login again (Gitea destroys the session maybe?).

wxiaoguang commented 6 months ago

If I understand correctly, that's by design. In Gitea, the URLs are always considered to equal no matter they have trailing slash or not.

So maybe you could make your reverse proxy to serve http://.../subpath as http://.../subpath/? Redirect to the "slashed" path?


Without the forward-slash users are told to login again (Gitea destroys the session maybe?).

I think the "login" (cookie) problem is related to the cookie path. The cookie is only set on "/subpath/" IIRC ......


I think there are some design problems for the "subpath" and "cookie", maybe it needs some improvements ....

wxiaoguang commented 6 months ago

After looking into the problem:

  1. The "Link" can't have trailing slash, because it is heavily used in code like this: {{.Link}}/other-path. It would cause double-slashes if there is a trailing slash:
    • //other-path : it is even worse, because browsers would use it as https://other-path.
    • /subpath//other-path: not a good URL path ....
  2. The cookie is only set on "/subpath/" (for security), so only "/subpath/" accesses could use it? Accesses to "/subpath" fail.
  3. The workaround could be: make reverse proxy always redirect all "/subpath?..." requests to "/subpath/?...."

To completely fix the problem, it needs more designs & improvements ..... not that easy by a simple fix.


Update: I think case 2 is the root problem.

wxiaoguang commented 6 months ago

Oops .... I found the root problem now. That's my fault (regression of #24107)

-> Fix incorrect cookie path for AppSubURL #29534

OdinVex commented 6 months ago

To reclarify, this issue is about pagination, in the template for links generated (First, Previous, #, Next, Last...) on the user dashboard.

Edit: Pagination is not using {{.Link}}/, it is using {{$.Link}} without a forward-slash, as I mentioned. I think those need updating.

wxiaoguang commented 6 months ago

More details in https://github.com/go-gitea/gitea/pull/29534#issuecomment-1974851486

If you focus on "fixing the trailing slash", I could remove the "fix" operation from that PR, and keep this issue open.

OdinVex commented 6 months ago

I'm unfamiliar with Go, I'm more of an Assembly/C/C++ kind of guy.

Edit: I took a peek around, I think maybe something in templates/user/dashboard/feeds.tmpl could address it? {{template "base/paginate" .}} That period, does that help to generate a link?

wxiaoguang commented 6 months ago

Edit: I took a peek around, I think maybe something in templates/user/dashboard/feeds.tmpl could address it? {{template "base/paginate" .}} That period, does that help to generate a link?

Nope. The dot syntax is a Golang template syntax, it means "pass all current data into the sub template"

lunny commented 6 months ago

resolved by #29534

OdinVex commented 6 months ago

resolved by #29534

It is not resolved by #29534, that was a different issue (original fixer misunderstood this bug because another was found at the same time.)

OdinVex commented 6 months ago

Thank you @lunny. :)

wxiaoguang commented 6 months ago

original fixer misunderstood this bug because another was found at the same time.

Well, TBH I do not think I misunderstand this issue (actually I won't call the "slash" problem is a "bug").

Maybe you misunderstand the problem. I will quote my explanations and summarize them again:

  1. The fact is: it is a requirement that making Gitea work with both https://host/SUBPATH and https://host/SUBPATH/, the document has said so:
  2. Based on the fact, a URL href="/SUBPATH?page=2... should work, it doesn't need an extra slash.
  3. Gitea never adds extra slash to a full path
    • so you only see: https://host/SUBPATH , https://host/SUBPATH/user, https://host/SUBPATH/user/repo
    • you won't see: https://host/SUBPATH/ , https://host/SUBPATH/user/, https://host/SUBPATH/user/repo/
  4. The root problem of your bug report is: "Without the forward-slash users are told to login again (Gitea destroys the session maybe?)."
    • I think it is caused by incorrect cookie path
    • After the fix (29534), I think it should have been resolved. Users won't get logout, nor they need to "manually edit the URL"

I don't see why you wouldn't follow the document and why you insist to add an extra trailing slash for some links, I don't see such a real use-case for end users either.

If you think the trailing slash is a must, please show some real bad cases for no-trailing-slash. Without a real bad case, I couldn't figure out what you mean.

OdinVex commented 6 months ago

The fact Gitea uses a forward-slash to begin with is enough to follow convention. Calling #29534 a fix is a hackjob following non-standard practice to assume a file is a directory to circumvent fixing the issue.

Edit: The root of the problem has nothing to do with the Cookie issue. That is a side-effect. The issue of parsing comes into play, mind. Some people build parsers and to have to create an exception for mis-generated links in pagination on one specific page just lends to the idea it should be polished and fixed, not to mention web server parsing specs.

wxiaoguang commented 6 months ago

Calling #29534 a fix is a hackjob following non-standard practice to assume a file is a directory to circumvent fixing the issue.

As I explained before, Gitea requires to make "/SUBPATH" the same as "/SUBPATH/". Since it is a "proxy path", it is not related to file or directory. There is no file nor directory in this case.

OdinVex commented 6 months ago

Calling #29534 a fix is a hackjob following non-standard practice to assume a file is a directory to circumvent fixing the issue.

As I explained before, Gitea requires to make "/SUBPATH" the same as "/SUBPATH/". Since it is a "proxy path", it is not related to file or directory. There is no file nor directory in this case.

Maybe not on your instance or Gitea's official instances. I happen to have a binary file named the same. Access to the instance can only be achieved with that slash, as intended by specs, not just my own. Edit: And yes, /SUBPATH/ is a directory, even if virtual. You might be thinking a piece of path rather than directory as in "folder", the term is interchangeable, but it is indeed intended for the server to provide a default ("app") in this case or conventionally index/default.html or return an error, etc.

Edit: To summarize, the issue is a lack of slashes. Your PR does not add slashes, it fixes a different issue (awesome). So when the slashes are added to the user dashboard (the only place this problem occurs) then we'll call it fixed, cause that's this Issue.

wxiaoguang commented 6 months ago

https://docs.gitea.com/administration/reverse-proxies#apache-httpd-with-a-sub-path # Note: no trailing slash after either /git or port

The document was written there to help users configure their instances correctly. If some users don't follow, then that's another problem.

As I explained above, Gitea never adds extra slash to the full path. So if you think it is a problem, then not only "pagination" is affected, a lot of hard-coded links are also affected, then it is not a simple problem, it is a new proposal to totally refactor the Gitea's link system.

Would you like to edit the title to clarify what you mean? For example: Always add an extra slash when the AppSubURL is used without other sub paths ("pagination" is just a tip of the iceberg)

OdinVex commented 6 months ago

This has nothing to do with how users configure their servers. That is not the issue. The issue is not the variable, it's the links generated specifically ONLY on the user dashboard. Yes, Gitea DOES add forward-slashes, you can grep the code for it all over. Pagination ONLY on the USER DASHBOARD is affected. ONLY ON THE USER DASHBOARD. On other pages pagination works correctly (AND HAS FORWARD-SLASHES) such as the repo page.

Repo Page:

<a class="item" href="/SUBPATH/issues">Issues</a>
<a class="item" href="/SUBPATH/pulls">Pull Requests</a>
<a class="item" href="/SUBPATH/milestones">Milestones</a>
<a class="item" href="/SUBPATH/explore/repos">Explore</a>

USER BLEEPING DASHBOARD:

<div class="center page buttons">
  <div class="ui borderless pagination menu">
    <a class="disabled item navigation">
      <span class="navigation_label">First</span>
    </a>
    <a class="disabled item navigation">
      <span class="navigation_label">Previous</span>
    </a>
    <a class="active item gt-content-center">1</a>
    <a class="item gt-content-center" href="/SUBPATH?page=2&amp;date=">2</a>
    <a class=" item navigation" href="/SUBPATH?page=2&amp;date=">
      <span class="navigation_label">Next</span>
    </a>
    <a class=" item navigation" href="/SUBPATH?page=2&amp;date=">
      <span class="navigation_label">Last</span>
    </a>
  </div>
</div>
OdinVex commented 6 months ago

First and Previous will also be affected, they using {{$.Link}} instead of {{$.Link}}/ as seen on other pages such as ./user/settings/security/webauthn.tmpl (data-url="{{$.Link}}/webauthn/delete" data-id="{{.ID}}">). :O A FORWARD-SLASH. If you're re-using the pagination template then it's getting incorrect link information (variable assignment/determination/whatever) from the user dashboard I'd guess.

wxiaoguang commented 6 months ago

Yes, Gitea DOES add forward-slashes, you can grep the code for it all over.

I guess I have explained many times ... Gitea never adds extra slash to a full path:

* so you only see: `https://host/SUBPATH` , `https://host/SUBPATH/user`, `https://host/SUBPATH/user/repo`
* you won't see: `https://host/SUBPATH/` , `https://host/SUBPATH/user/`, `https://host/SUBPATH/user/repo/`

(Update: OK, https://host/SUBPATH/ is quite special in many cases, sometime it has the trailing slash, sometimes it doesn't)

There could be more cases that the SUBPATH misses the correct trailing slash .... for example, I just fixed some: #29535

Anyway, if you believe only the "dashboard" page needs to be fixed, it's fine to me. It could be fixed by this:

-> Add an trailing slash to dashboard links #29555

You could apply this patch to your instance to see whether it satisfy your requirement, I will add some tests later.

OdinVex commented 6 months ago

I guess I have explained many times ... Gitea never adds extra slash to a full path:

Ah, so the issue of understanding was a language one. It does add forward-slashes for any subdirectory but not for root directory (aka subpath), is what you meant. It should, otherwise clients are sending a parameter request to a file (or a cgi script even such as PHP). Separation of routing by paths (not virtual file pathing) is not only best-practices it can be a requirement in some environments due to software collisions. There is no reason to not use a forward-slash except cosmetic when accounting for all that.

There could be more cases that the SUBPATH misses the correct trailing slash .... for example, I just fixed some: #29535

I took a quick look in most places, only the user dashboard had the issue without going too deep into things, I also had to disable Notifications because of an Apache bug with reverse-proxies that prevents multiple threads (ends up blocking new threads because of concurrently held threads) so we wouldn't have noticed the Notifications. We also don't use comments on the instance but rather in Matrix, so we wouldn't have noticed it there either, good finding.

You could apply this patch to your instance to see whether it satisfy your requirement, I will add some tests later.

I will find the time to test it, thank you.

(Update: OK, https://host/SUBPATH/ is quite special in many cases, sometime it has the trailing slash, sometimes it doesn't)

Yes, the inconsistency could accidentally be parsed and routed as a file on some servers (if they are not canonizing /SUBPATH as /SUBPATH/). That was why I was trying to point out the whole "it could be a file" thing, because it genuinely could. If it always has forward-slashes then we know for certain we want the instance, instead of any file or accidental misrouting because the forward-slash would be.

wxiaoguang commented 6 months ago

Oh one more thing, since we have a long discussion for this simple problem, I think maybe we could try to improve the communication in the future.

What I understand:

What you proposed if I understand correctly now:

Fortunately, eventually, we could understand each other better. 🙏 In the future if there would be some real cases for a proposal/bug report, I think I could catch you up more quickly. Thank you again for the constructive discussion.

wxiaoguang commented 6 months ago

I also had to disable Notifications because of an Apache bug with reverse-proxies that prevents multiple threads (ends up blocking new threads because of concurrently held threads)

Do you mean this? #19265 I also participated in it .... 🤣

OdinVex commented 6 months ago

...

Some installations must follow standards which do not call for canonize paths the way you're assuming users should. This assumption should not be made. The code and documentation should not have ever tried to canonize the name, that should be a user preference to control that sort of redirection if they wish. I'd support the suggestion, but a requirement would remove preference and control (and even be impossible in some situations).

There are some mentioned RFC standards (could you share the link? I'd like to take a deep look).

I found that information while building a web server back in early 2004 to specifically build a cgi backend specific to an embedded system (an old Surfboard modem), it's been a long time. I'll try to remember the number. It centered around defining URL/URI and what is considered a file versus a directory and the ambiguity leaving it up to servers to define (but since it's an option it should be left to the user if it isn't required by the server and Gitea doesn't functionally need that because links can be written with a slash addressing it). There was something about case-sensitivity being ignored for host and scheme (port, etc) but specifically not for the rest of a URL/URI and an empty URL/URI should be processed as "/" (but that is only for "" requests, eg empty, and /something is not empty). The RFC was from the 1990s but it was still relevant in HTTP1.1 at the time. (Edit: I have not seen anything that updates or changes anything about assuming a path should end in a "/" and not allowing a file to be served for /subpath versus a proxy at /subpath/ because they can be differentiated by the server routing/processing, still is in Apache. I don't use Nginx, if this behavior is no longer the same among web servers then it's either a free-for-all decision or an RFC update, in which case I'd love to know. Still, canonicalizing the names with forward-slash to prevent issues and polishing things would be the best approach I genuinely believe.) Edit: I also remember the RFC mentioning request erroneous implementations of URI parsing resulting in badly formed queries and it gave a rundown of this stuff specifically mentioning interoperability issues because of URI parsing (such as removing /../ and such from URIs but only if they're in the path, etc). I think it also mentioned something about not making assumptions because of client implementations either intending to access a specific URI.

Edit: Although I can't find the original RFC, here's one that specifically mentions the kind of assumption demonstrated by Gitea:

[6.2.4](https://www.rfc-editor.org/rfc/rfc3986#section-6.2.4).  Protocol-Based Normalization

   Substantial effort to reduce the incidence of false negatives is
   often cost-effective for web spiders.  Therefore, they implement even
   more aggressive techniques in URI comparison.  For example, if they
   observe that a URI such as

      http://example.com/data

   redirects to a URI differing only in the trailing slash

      http://example.com/data/

   they will likely regard the two as equivalent in the future.  This
   kind of technique is only appropriate when equivalence is clearly
   indicated by both the result of accessing the resources and the
   common conventions of their scheme's dereference algorithm (in this
   case, use of redirection by HTTP origin servers to avoid problems
   with relative references).

The language here is implying there is a time it can be appropriate and a time it can be inappropriate. Canonizing to assume a forward-slash instead of allowing for a file by the same name as the subpath proxied for an Instance would be an example of an inappropriate use of technique. For anyone not serving a file Gitea's current behavior would be perfectly reasonable but for others it wouldn't be. Adding a forward-slash of course fixes it for all users in the end, that was why I was trying to communicate it. That RFC isn't the one I remember reading but it does talk about that kind of thing. The section following that, [7.3](https://www.rfc-editor.org/rfc/rfc3986#section-7.3). Back-End Transcoding also talks about taking special note of filesystem characters (forward-slash among them) but it merely lends to a summary of trying to not make assumptions about filesystems and such. '/subpath' also belongs to '/' whereas '/subpath/' belongs to the sub-router '/subpath', not '/', I can't remember where I read about that though. It was about trying to differentiate routing and how they should be implemented for user understanding.

Edit: I recall the RFC talking about fragmenting the URI and essentially trying to establish "this is a child fragment of this parent" and how to determine that, that document seems to also mention it (but that's a 2014 one, the one I read was from 1990s). "The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource." A lot of semantics. Canonicalization is fine for those that want it, but if someone wants to serve a file by the same name they should be allowed if it doesn't break Gitea. It can go both ways but supporting the forward-slash removes the ambiguity and fixes the issue for everyone involved. Those not using the forward-slash see nothing different and those using it see it working correctly for them. A bit tired, stayed up too late, bed for me. :)

Do you mean this? https://github.com/go-gitea/gitea/issues/19265 I also participated in it .... 🤣

Yep! :D

OdinVex commented 6 months ago

Edit: Woke up recalling this might help explain a perfectly plausible situation for needing the differentiation: /gitea.php is a script that handles authentication (users do not see .php, link canonicalization and disabling of .php direct access) and redirects to /gitea/ after logged in (but the redirect specifically is clean so that people can't try to pass parameters to the instance as soon as they login, avoids exploits around links). Other services are maintained and they all need a unified structure for doing this kind of thing across different systems, so while it is specific it also describes a scenario in which ambiguity assumptions can break something. Temporarily we added Redirector (web browser extension) redirects so that anything /subpath? gets redirected to /subpath/? but it's browser-specific until the patch is tested and rolled out. So many interconnected different softwares and backends, x_X.

OdinVex commented 6 months ago

Awesome, very well done. :)