RaspberryPiFoundation / lesson_format

Lesson formatter
17 stars 28 forks source link

Strange characters in es-ES Scratch projects #173

Closed rikcross closed 8 years ago

rikcross commented 8 years ago

I'm not sure whether this belongs here or in the scratch-curriculum repo.

The scratch/es-ES/Scratch 1 projects aren't building. I get the following error:

python build.py  world lessons/scratch lessons/python lessons/webdev output/codeclubworld
Traceback (most recent call last):
  File "build.py", line 1362, in <module>
    build(p.pdf_generator, p.lesson_dirs, p.region, p.output_dir, p.verbose)
  File "build.py", line 955, in build
    project = parse_project_meta(p, theme)
  File "build.py", line 1157, in parse_project_meta
    header = yaml.safe_load("".join(header_lines))
  File "/Library/Python/2.7/site-packages/yaml/__init__.py", line 93, in safe_load
    return load(stream, SafeLoader)
  File "/Library/Python/2.7/site-packages/yaml/__init__.py", line 69, in load
    loader = Loader(stream)
  File "/Library/Python/2.7/site-packages/yaml/loader.py", line 24, in __init__
    Reader.__init__(self, stream)
  File "/Library/Python/2.7/site-packages/yaml/reader.py", line 79, in __init__
    self.determine_encoding()
  File "/Library/Python/2.7/site-packages/yaml/reader.py", line 135, in determine_encoding
    self.update(1)
  File "/Library/Python/2.7/site-packages/yaml/reader.py", line 165, in update
    exc.encoding, exc.reason)
yaml.reader.ReaderError: 'utf8' codec can't decode byte #x92: invalid start byte
  in "<string>", position 124
make: *** [pages_world] Error 1

In Sublime Text, accented characters look odd. E.g. Boat Race:

screen shot 2015-10-12 at 09 42 52

I've added .ignore to the end of the manifest, so that you can see the files causing the problems.

I've tried replacing the accented characters, but this doesn't seem to solve the problem.

andylolz commented 8 years ago

These files should be saved with the UTF-8 encoding. Through a bit of trial and error, I gather they are probably saved with “Western (Mac Roman)” encoding. However, Sublime is guessing “Western (Windows 1252)” which is why they look weird in sublime.

In Sublime, you can do View -> Show Console, then type view.encoding() to see the encoding Sublime has opened the current file with. Then you can do File -> Reopen with Encoding to try a different encoding.

The rule is pretty much: If it’s not saved and opened with UTF-8, then something is wrong.

rikcross commented 8 years ago

Thanks Andy, I'll try that -- thanks for the speedy help! (p.s. your boat is AWESOME!)

andylolz commented 8 years ago

haha thanks! I have sent a pull request with all the broken ES Scratch 1 files fixed (all of the notes were fine) and including some instructions.

rikcross commented 8 years ago

That's great, cheers! :)

freyjaodds commented 8 years ago

​Andy, do we tell you enough what a star you are?​ I've been struggling with these annoying characters for over a week now!

Freyja Oddsdóttir Operations Executive www.codeclubworld.org @CodeClubWorld http://twitter.com/codeclubworld | www.facebook.com/codeclubworld http://facebook.com/codeclubworld

On 12 October 2015 at 10:53, Andy Lulham notifications@github.com wrote:

haha thanks! I have sent a pull request with all the ES Scratch 1 files fixed, including some instructions!

— Reply to this email directly or view it on GitHub https://github.com/CodeClub/lesson_format/issues/173#issuecomment-147354754 .

andylolz commented 8 years ago

It looks like Sublimetext has the following setting that may well come in handy:

"default_encoding": "UTF-8"

Btw thanks for the extremely clear bug report, @CodeClubRik! Circling on the screenshot is vv helpful!

martinpeck commented 8 years ago

@freyjaodds I suspect the issue originated from the files you were supplied for these translations. The default encoding that @andylolz mentions is something that Sublime Text has set up by default. Do you know what editor(s) the translator was using? Is it worth us being a bit more prescriptive about this sort of thing?

freyjaodds commented 8 years ago

You are probably right, because they were .txt files as opposed to .md files.

This is the only time we have come across this issue so I think it is circumstantial - usually people open the english .md file to translate (and we have had loads of projects in languages with non-english characters with no issues). Therefore, I am inclined to think that it's likely a one-off and we don't need to be prescriptive (and complicate the instructions).

martinpeck commented 8 years ago

I suspect it's less likely if someone takes the .md files and modifies them, but it's still possible to open the .md files, change the text, and then save them with a different encoding scheme (though you'd probably have to go out of your way to do this).

If we're not going to be prescriptive about the editor that people should use, we should at least be prescriptive about the approach to translation (i.e. say that they should take the en-GB .md files, modify the text in place, and then give them to us). Same goes for some of the other things we need to be a bit clearer about (not localising front-matter keys, not moving the location of images, not localising the folder names)

freyjaodds commented 8 years ago

Agreed - that's what the translation guide currently says, and it gives them a link to the en-GB files to be modified. A lot of these things you mention are already specified in the translation guide, but I'll go through it again to make sure it's crystal clear.

andylolz commented 8 years ago

The default encoding that @andylolz mentions is something that Sublime Text has set up by default

You’re quite right! My mistake.

I can’t remember if I mentioned this elsewhere, but I think you can add a post-receive hook to the curriculum repos that rejects anything that’s not UTF-8 encoded. That would prevent anything bad sneaking in. (If I already suggested this before, please ignore me! I feel like I may have…)

martinpeck commented 8 years ago

@rikcross can we close this issue? I'm guessing yes but wanted to double check with you.

rikcross commented 8 years ago

Yep, this can be closed now.