RaspberryPiFoundation / scratch-curriculum

Term 1 and 2 of Code Club, learning Scratch
Other
210 stars 365 forks source link

Untracked Changes Seen When Cloning Repo Due #38

Closed martinpeck closed 10 years ago

martinpeck commented 11 years ago

Possibly due to codepage issues with the file/folder naming of some files git will show untracked changes for a cleanly cloned repo.

Repro steps:

  1. fork scratch-curriculum
  2. clone to Mac (may also cause problems on Windows, but not checked)
  3. git status will now show the following...
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   "Tu\314\210rkc\314\247e/"
#   "es-ES/Trimestre 1/01 Fe\314\201lix y Herbert/"
#   "es-ES/Trimestre 1/04 Ma\314\201quina de frutas/"
#   "es-ES/Trimestre 1/07 Que\314\201 es eso/"
#   "es-ES/recursos para voluntarios/Gui\314\201aDeInicioScratch1.3.pdf"
#   "es-ES/recursos para voluntarios/Gui\314\201aDeReferenciaScratch1.4.pdf"
#   "es-ES/recursos para voluntarios/tarjetas de Scratch/XX_crono\314\201metro.EN.pdf"
tef commented 11 years ago

Unicode Normalization or lack of it, seems to be the issue. Mac OSX uses NFD, git names are in NFC I think.

There is a git flag to fix this, but it might be easier to switch to NFD

tef commented 11 years ago

Thanks to Jamie for spotting this :-)

We can do precomposeunicode as a setting in git in modern versions to fix this or just fix the repo.

JackuB commented 11 years ago

Hey, I'm having same issues on Windows (can't even clone repo or extract downloaded zip) and Mac (untracked files)

turk

precomposeunicode (afaik) will work only on mac, but what about windows?

drtortoise commented 10 years ago

this continues to be a problem as it is preventing people from being able to contribute two people doing translations have contacted me this past week saying they like to contribute (Polish lessons, and Latin spanish lessons), but didn't know how to commit their changes without fucking it up

tef commented 10 years ago

@JackuB What version of git are you using ?

tef commented 10 years ago

The documents note that Git for Windows 1.7.10 or higher is needed, or Git under cygwin 1.7, to support unicode file names.

For OS X, it's Git 1.8.2, and git config --global core.precomposeunicode true before cloning

JackuB commented 10 years ago

git version 1.8.3.msysgit.0 on Win and git version 1.7.12.4 (Apple Git-37) on Mac

drtortoise commented 10 years ago

Updated to git 1.8.4.1, set git config --global core.precomposeunicode to true and recloned the repo and still getting untracked files fml

edent commented 10 years ago

If it's any help, the clone works fine in Linux.

russss commented 10 years ago

The offending file names appear to be stored in the repo as decomposed unicode (either that or my Linux machine is decomposing them for display, which sounds unlikely?):

ls ./botão-amarelo.gif | xxd
0000000: 2e2f 626f 7461 cc83 6f2d 616d 6172 656c  ./bota..o-amarel

(0xcc 0x83 is the combining tilde)

Note that this is not the same as what git status is telling you, which is the composed character. argh:

pt-BR/Curso 1/08 Ferramenta de Desenho/Recursos/bot\303\243o-amarelo.gif

My suspicion is that the core.precomposeunicode setting only composes the unicode characters when you add the files to the repository, and existing decomposed code points in the repo will still cause problems.

(This also makes sense because there are other files in that repo with names containing accented characters which aren't causing problems.)

SteveJones commented 10 years ago

Try core.ignorecase, maybe it will ignore normal form as well.

tef commented 10 years ago

Note: Linux uses NFC, OSX uses NFD. HFS+ is case ignorant but case preserving too.

russss commented 10 years ago

This procedure will "fix" it:

1) Check out the repo on a Mac with core.precomposeunicode enabled 2) Check out the repo on a non-Mac 3) On the non-Mac, git rm all the affected files 4) On the Mac, git add . and commit 5) Make sure all Macs adding files to the repository in the future use core.precomposeunicode. 6) Despair

Finally:

russ@zvezda:~/test/ca/Trimestre 1/01 Felix & Herbert $ ls
Fèlix-i-Herbert.sb            fèlix_i_herbert.md            notes per als caps de club.md
russ@zvezda:~/test/ca/Trimestre 1/01 Felix & Herbert $ git ls-files
"Fe\314\200lix-i-Herbert.sb"
"F\303\250lix-i-Herbert.sb"
"fe\314\200lix_i_herbert.md"
"f\303\250lix_i_herbert.md"
notes per als caps de club.md
drtortoise commented 10 years ago

@russss yeah I was just on #computer saying I need a non-mac to fix this, maior is going to help me out later

tef commented 10 years ago

@russss this will move all the NFC files to NFD i think, which then might break other platforms.

tef commented 10 years ago

My search leads me to believe it's OK though to move to NFD names, as allegedly Windows and Linux use opaque strings as filenames

russss commented 10 years ago

No, because core.precomposeunicode does actually convert the filenames from NFD to NFC when adding to git. I agree that everything in the repo needs to be NFC.

tef commented 10 years ago

The thing I find weird is that git rm'ing the files shouldn't be necessary, because if precompose is on checking out should handle this? If the filenames are in NFC

If the filenames are in NFD, well it should work on OSX

core.precomposeunicode

"This option is only used by Mac OS implementation of Git. When core.precomposeunicode=true, Git reverts the unicode decomposition of filenames done by Mac OS."

tef commented 10 years ago

Also why does git ls-filesi n the last example have both NFC and NFD names. This is confusing.

drtortoise commented 10 years ago

not necessary to rm the files? well let's see

russss commented 10 years ago

My suspicion is that pretty much all the documentation of core.precomposeunicode is wrong. I don't think it has any on-clone features. What it does do is make sure that any new files added to the repository have NFC filenames.

What I still can't explain is why git on OS X is failing to work with filenames which are NFD.

The reason my git ls-files example has both is to show that Git will happily accept both types and it'll confuse you. I added those by just doing git add . on OS X.

drtortoise commented 10 years ago

oh hai

works for me

russss commented 10 years ago

No, now you've got two copies of each file in the repo! Check out git ls-files in one of those directories :(

tef commented 10 years ago

NOW YOU HAVE TWO PROBLEMS

drtortoise commented 10 years ago

There are now no untracked files seen when cloning repo, I have removed the duplicate files created by https://github.com/CodeClub/scratch-curriculum/commit/b17b82462f353c2e8def50db3b6eee0b77361462

if you're on a mac you need to either clone a fresh repo or MAKE SURE YOU DONT COMMIT DELETING FILES because the mac thinks the duplicate files I deleted are the same as the ones that are remaining, and we dont want to delete the remaining ones. Also FML.