Closed martinpeck closed 10 years ago
Unicode Normalization or lack of it, seems to be the issue. Mac OSX uses NFD, git names are in NFC I think.
There is a git flag to fix this, but it might be easier to switch to NFD
Thanks to Jamie for spotting this :-)
We can do precomposeunicode as a setting in git in modern versions to fix this or just fix the repo.
Hey, I'm having same issues on Windows (can't even clone repo or extract downloaded zip) and Mac (untracked files)
precomposeunicode (afaik) will work only on mac, but what about windows?
this continues to be a problem as it is preventing people from being able to contribute two people doing translations have contacted me this past week saying they like to contribute (Polish lessons, and Latin spanish lessons), but didn't know how to commit their changes without fucking it up
@JackuB What version of git are you using ?
The documents note that Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7, to support unicode file names.
For OS X, it's Git 1.8.2, and git config --global core.precomposeunicode true
before cloning
git version 1.8.3.msysgit.0
on Win
and git version 1.7.12.4 (Apple Git-37)
on Mac
Updated to git 1.8.4.1, set git config --global core.precomposeunicode to true and recloned the repo and still getting untracked files fml
If it's any help, the clone works fine in Linux.
The offending file names appear to be stored in the repo as decomposed unicode (either that or my Linux machine is decomposing them for display, which sounds unlikely?):
ls ./botão-amarelo.gif | xxd
0000000: 2e2f 626f 7461 cc83 6f2d 616d 6172 656c ./bota..o-amarel
(0xcc 0x83 is the combining tilde)
Note that this is not the same as what git status is telling you, which is the composed character. argh:
pt-BR/Curso 1/08 Ferramenta de Desenho/Recursos/bot\303\243o-amarelo.gif
My suspicion is that the core.precomposeunicode setting only composes the unicode characters when you add the files to the repository, and existing decomposed code points in the repo will still cause problems.
(This also makes sense because there are other files in that repo with names containing accented characters which aren't causing problems.)
Try core.ignorecase, maybe it will ignore normal form as well.
Note: Linux uses NFC, OSX uses NFD. HFS+ is case ignorant but case preserving too.
This procedure will "fix" it:
1) Check out the repo on a Mac with core.precomposeunicode
enabled
2) Check out the repo on a non-Mac
3) On the non-Mac, git rm
all the affected files
4) On the Mac, git add .
and commit
5) Make sure all Macs adding files to the repository in the future use core.precomposeunicode
.
6) Despair
Finally:
russ@zvezda:~/test/ca/Trimestre 1/01 Felix & Herbert $ ls
Fèlix-i-Herbert.sb fèlix_i_herbert.md notes per als caps de club.md
russ@zvezda:~/test/ca/Trimestre 1/01 Felix & Herbert $ git ls-files
"Fe\314\200lix-i-Herbert.sb"
"F\303\250lix-i-Herbert.sb"
"fe\314\200lix_i_herbert.md"
"f\303\250lix_i_herbert.md"
notes per als caps de club.md
@russss yeah I was just on #computer saying I need a non-mac to fix this, maior is going to help me out later
@russss this will move all the NFC files to NFD i think, which then might break other platforms.
My search leads me to believe it's OK though to move to NFD names, as allegedly Windows and Linux use opaque strings as filenames
No, because core.precomposeunicode
does actually convert the filenames from NFD to NFC when adding to git. I agree that everything in the repo needs to be NFC.
The thing I find weird is that git rm'ing the files shouldn't be necessary, because if precompose is on checking out should handle this? If the filenames are in NFC
If the filenames are in NFD, well it should work on OSX
core.precomposeunicode
"This option is only used by Mac OS implementation of Git. When core.precomposeunicode=true, Git reverts the unicode decomposition of filenames done by Mac OS."
Also why does git ls-filesi n the last example have both NFC and NFD names. This is confusing.
not necessary to rm the files? well let's see
My suspicion is that pretty much all the documentation of core.precomposeunicode
is wrong. I don't think it has any on-clone features. What it does do is make sure that any new files added to the repository have NFC filenames.
What I still can't explain is why git on OS X is failing to work with filenames which are NFD.
The reason my git ls-files example has both is to show that Git will happily accept both types and it'll confuse you. I added those by just doing git add .
on OS X.
oh hai
works for me
No, now you've got two copies of each file in the repo! Check out git ls-files
in one of those directories :(
NOW YOU HAVE TWO PROBLEMS
There are now no untracked files seen when cloning repo, I have removed the duplicate files created by https://github.com/CodeClub/scratch-curriculum/commit/b17b82462f353c2e8def50db3b6eee0b77361462
if you're on a mac you need to either clone a fresh repo or MAKE SURE YOU DONT COMMIT DELETING FILES because the mac thinks the duplicate files I deleted are the same as the ones that are remaining, and we dont want to delete the remaining ones. Also FML.
Possibly due to codepage issues with the file/folder naming of some files git will show untracked changes for a cleanly cloned repo.
Repro steps: