Closed bouweandela closed 2 years ago
I think it's the dash on this line: https://github.com/ESMValGroup/ESMValTool/blob/d75b479d42154676ce39fa8e06b97081aeb40c1f/esmvaltool/recipes/recipe_eady_growth_rate.yml#L10
It's a bit puzzling that I do not get this error in other environments/machines..
but seriously now, it's how YAML reads encoded stuff, and only the printable chars from UTF-8 are allowed, see this Stackoverflow post - it is interesting that that's not picked up anywhere else - you using an older pyyaml
?
No, pyyaml 6.0, the same version works fine on my own computer. I suspect it's something to do again with encoding characters and how that's set up using environmental variables..
gah! We should really make a complete move to ruamel
:+1:
It's not the dash per se, but rather the invisible PAD character. Here's a hex dump of that line:
0000000 J o u r n a l o f t
2020 2020 6f4a 7275 616e 206c 666f 7420
0000020 h e a t m o s p h e r i c s
6568 6120 6d74 736f 6870 7265 6369 7320
0000040 c i e n c e s , 4 7 ( 1 5 ) :
6963 6e65 6563 2c73 3420 2837 3531 3a29
0000060 1 8 5 4 342 200 223 1 8 6 4 , 1 9 9
3831 3435 80e2 3193 3638 2c34 3120 3939
0000100 0 . ) . \n
2e30 2e29 000a
such high values of 2.e+30 will not stand, man! :laughing: What's a PAD character, Klaus? Nevermind, it really is a padding character :man_facepalming:
It is something to do with our code assuming that the files are encoded in 'utf-8' instead of saying so everywhere explicitly. When I run
import locale
locale.getpreferredencoding(False)
I get 'ISO-8859-1'
on the Levante notebook server instead of the usual'UTF-8'
. This results in the wrong interpretation of the file. This code
from pathlib import Path
file = Path("/home/k/k206100/.conda/envs/esmvaltool/lib/python3.10/site-packages/esmvaltool/recipes/recipe_eady_growth_rate.yml")
txt = file.read_text()
for i, char in enumerate(txt[350:355]):
print(350+i, char, hex(ord(char)))
produces
350 5 0x35
351 4 0x34
352 â 0xe2
353 0x80
354 0x93
while after running
import locale
locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')
this results in
350 5 0x35
351 4 0x34
352 – 0x2013
353 1 0x31
354 8 0x38
@valeriupredoi Remember #973?
Maybe it is time we fix this problem, apparently open
takes an encoding='utf-8'
argument, so if we specify that everywhere we open a text file in ESMValCore, the problem should go away.
yes, that thing again! But I cordially protest we should be using en_GB.UTF-8
, please, mate :gb: :beer:
Thanks for the help gents!
I helped by posting a Big Lebowski meme :laughing:
With esmvaltool v2.5 and this conda environment on the Levante Jupyterhub, I get the following error message when I run