go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.71k stars 5.46k forks source link

Wiki pages are stored as/converted to CRLF line endings #17541

Open plgruener opened 2 years ago

plgruener commented 2 years ago

Gitea Version

1.16.0+dev-455-ga5bcf1994

Git Version

No response

Operating System

No response

How are you running Gitea?

Tried in https://try.gitea.io/plgruener/wikitest/wiki/_pages

Database

No response

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Description

Any wiki page that is created in the webeditor (wiki/_new) is stored as file with (Windows-style) CRLF line endings, not LF (Unix-style).
This also means every wiki-page that is created locally with LF line endings (eg under Linux or MacOS) and then pushed is silently converted to CRLF, either on push itself or when that page is edited in the webeditor by another person. Since that LF->CRLF conversion changes every line, it makes a diff essentially useless.

Screenshots

No response

wxiaoguang commented 2 years ago

In my opinion:

It is a browser behavior, I think Gitea (Web UI) should not touch it: https://github.com/whatwg/html/issues/6647

Nowadays, all modern applications in all OS can handle CRLF and LF correctly, so it won't be a problem.

The diff can ignore spaces: https://stackoverflow.com/questions/40974170/how-can-i-ignore-line-endings-when-comparing-files

If you edit files locally, git respects settings like core.autocrlf or .gitattributes to set EOL.

If these methods are not enough, then maybe we need to think about a plan to cover all cases, maybe Gitea can set settings in .gitattributes for wiki pages (I am not sure about details).

ranvis commented 8 months ago

What autocrlf = true does makes things worse. *.md files on working directory are in CRLF, but committing blob will be normalized to LF because files are now identified as text. Every pages are marked as changed on local clone because of this normalization. You cannot pull --ff-only anymore.

maybe Gitea can set settings in .gitattributes for wiki pages (I am not sure about details).

I think the --path option to git-hash-object --stdin should be enough as a start. Users can add .gitattributes on their own.

# Windows command prompt
> git config --list | cat
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
> dos2unix lf.md
> cp lf.md crlf.md
> unix2dos crlf.md
> git hash-object crlf.md lf.md
f60ee7132b2e348828dfff5a402c5af9cae7be6f
6ad9ec4280d83f69f748d4f599226bd4d13e7cf6
> echo *.md text > .gitattributes
> git hash-object crlf.md
6ad9ec4280d83f69f748d4f599226bd4d13e7cf6  # Git normalizes CRLF to LF
> git hash-object --stdin < crlf.md
f60ee7132b2e348828dfff5a402c5af9cae7be6f  # Git does not normalize, because of unnamed stdin input
> git hash-object --stdin --path crlf.md < crlf.md
6ad9ec4280d83f69f748d4f599226bd4d13e7cf6  # Git normalizes CRLF to LF again, thanks to --path
wxiaoguang commented 8 months ago

Related to this behavior: Browsers always use "CRLF" for new lines when a textarea is submitted. So, Gitea need to do extra converting before it really writes the content into the repo.

https://github.com/go-gitea/gitea/pull/28119#issuecomment-1912129780

ranvis commented 8 months ago

@wxiaoguang I appreciate if Gitea's editor views have a EOL option or something like most offline editors do. Additionally, Gitea could have core.autocrlf per-repo setting for in-server updates.

Yet, compared to those things, support of the --path option for .gitattribute could be simpler. This fit better with #17496 though.

plgruener commented 5 months ago

@wxiaoguang

In my opinion: It is a browser behavior, I think Gitea (Web UI) should not touch it

Browsers always use "CRLF" for new lines when a textarea is submitted. So, Gitea need to do extra converting before it really writes the content into the repo.

If you edit files locally, git respects settings like core.autocrlf or .gitattributes to set EOL.

Exactly, Gitea should not touch it and silently convert my files.

Honestly I don't care which settings the textarea in the browser uses. If it always uses CRLF, then the web editor should behave like any other Windows user and make use of the core.autocrlf=true setting to transparently convert the file from the index to CRLF, edit it in the browser with CRLF, and then when checking-in/committing convert that back into LF endings again.
Git already supports all of this functionality, you just have to use it?

When editing files via the webeditor in a normal (non-wiki-) repo, (almost) everything behaves as it should and LF files are not converted. So I really don't understand the problem: why can it apparently be done for the README.md in my repo, but not the Home.md in the wiki?

(I also tried how Github handles this issues, here Wiki files with LF endings are not converted.)

wxiaoguang commented 5 months ago

Let me share more information: according to HTML standard, the "newline/EOL" is always "CRLF" in HTML's textarea. So Gitea backend could always get "CRLF" from a textarea.

When editing files via the webeditor in a normal (non-wiki-) repo, (almost) everything behaves as it should and LF files are not converted.

Because "LF" is hard-coded in non-wiki backend code, all CRLF are replaced by LF in backend for these files (well, Windows users might feel unhappy about this behavior ....)


I also agree that it should do the best to make EOL correct. So the solutions could be:

  1. Use an advanced frontend editor, make the editor submit correct LF/CRLF bytes to backend
  2. Make backend "auto detect" the existing file's EOL. If the existing file uses LF, then replace all EOL to LF, the same to CRLF.

Or, as a quick fix: always use "LF" for wiki files, too, just like these non-wiki files.

wxiaoguang commented 5 months ago

ps: original wiki code wasn't written by me, neither the editor/upload code, I just happen to know the details 🤣 if I would have enough time I could also look into the problem and/or try to improve it, but I can't promise at the moment.

plgruener commented 5 months ago

Thank you for the info.

Because "LF" is hard-coded in non-wiki backend code, all CRLF are replaced by LF in backend for these files (well, Windows users might feel unhappy about this behavior ....)

I hadn't even noticed that yet, but yeah, that's suboptimal as well.

I cannot argue which solution would be better – both have to do an "auto detect", so it probably doesn't matter if you do it in the front- or backend. 2. is more closely what git itself does, and it's more robust if you ever decide to switch frontend editors (or offer multiple editors for the user to chose from).

Or, as a quick fix: always use "LF" for wiki files, too, just like these non-wiki files.

Yes, I agree it should at least be consistent (else I have to always configure my wiki-repos different than the normal ones, very confusing).