emmt / EasyFITS.jl

Using FITS files made easier for Julia
Other
8 stars 2 forks source link

erroneous french writing of scientific float keywords #7

Open buthanoid opened 1 year ago

buthanoid commented 1 year ago

Problem

When PyPlot is loaded in the session, and you have french locale, the writing of scientific float keywords is erroneous.

The value 3.333e60 will be written `3,3E+60 in the FITS file.

Reproduce

See my envrionnement extract about languages:

LANGUAGE=fr_FR
GDM_LANG=fr_FR
LANG=fr_FR.UTF-8

in a shell call julia then:

pkg> activate --temp
pkg> add EasyFITS
pkg> add PyPlot
julia> using EasyFITS
julia> writefits!("/tmp/test-pyplot.fits", FitsHeader("ERROR" => 3.333e60), [1;;])
shell> cat /tmp/test-pyplot.fits

You will see that the ERROR keyword has value 3.333E+60 which is correct. Now:

julia> using PyPlot
julia> writefits!("/tmp/test-pyplot.fits", FitsHeader("ERROR" => 3.333e60), [1;;])
shell> cat /tmp/test-pyplot.fits

You will see that the ERROR keyword has value 3,3E+60 which is wrong and problematic:

julia> read(FitsHeader, "/tmp/test-pyplot.fits")
ERROR: ArgumentError: failed to parse value of float FITS card

Solutions (none)

You can "solve" the problem by launching julia with the following command: LC_ALL="en_US.UTF-8" julia. Of course it is not a solution for EasyFITS.

The code of CFITSIO for the writing of float keywords had a bug. I told the dev team of CFITSIO but the fix will only be available in a next version (we don't know when). The fix resolves the present bug.

It is possible to fix the problem by setting LC_ALL in the ENV variable of Julia if you do it before loading PyPlot. Once PyPlot is loaded, modifying ENV does not resolve the problem.

I tried to understand what PyPlot does when loading but could not find it.

emmt commented 1 year ago

When reading CFITSIO code (version 4.2.0) some time ago, my understanding was that CFITSIO is rewriting literal floating-point constants to replace commas (,) by dots (.).

buthanoid commented 1 year ago

Indeed it does but there is a mistake in that code, that is why the bug concerns only scientific float, instead of every float. I send them the fix but the last update of CFITSIO is 2022 and I do not know when and if they will include the fix.

My idea was to print a warning when french locale is used, but since I don't understand why the bug occurs only when PyPlot is loaded, I don't know which conditions to use for the printing.

My last idea was that maybe the problem was PyCall, and I tried to inspect the "startup" script, on the clue that the problem appears even if you do not call any PyPlot function. But I gave up

buthanoid commented 1 year ago

The interesting part is that:

emmt commented 1 year ago

Another possible fix could be to rewrite the header in EasyFITS. I don't know if that would be easy or not.

emmt commented 1 year ago

If there is no such bug with fitsio.jl, it can be worth investigating how they solve it.

buthanoid commented 1 year ago

FITSIO has the same issue.

Good case (ERROR = 3.333E+60):

pkg> activate --temp
pkg> add FITSIO
pkg> add PyPlot
julia> import FITSIO: fitswrite, FITSHeader
julia> fitswrite("/tmp/test-fitsio.fits", [1;;]; header=FITSHeader(["ERROR"], [3.333e60], [""]))
shell> cat /tmp/test-fitsio.fits

Bad case (ERROR = 3,3E+60):

julia> import PyPlot
julia> fitswrite("/tmp/test-fitsio.fits", [1;;]; header=FITSHeader(["ERROR"], [3.333e60], [""]))
shell> cat /tmp/test-fitsio.fits

However it parses it as a String:

julia> import FITSIO.read_header
julia> read_header("/tmp/test-fitsio.fits")
SIMPLE  =                    T / file does conform to FITS standard
[...]
ERROR   = '3,3E+60 '    
buthanoid commented 1 year ago

Maybe one day I will investigate what are the locale that are seen by CFITSIO, where do they come from, when are they initialised and how can they change. If we can simply force only CFITSIO to use US locale, without changing any ENV variable, it would fix the bug

buthanoid commented 1 year ago

Investigating on locale.

setting the locale

I launch julia in a way to force the french locale no matter what my ENV variables are:

LC_ALL="fr_FR.UTF-8" julia

In the following,

julia> unsafe_string( @ccall setlocale(1::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"C"
julia> unsafe_string( @ccall setlocale(6::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=fr_FR.UTF-8;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=fr_FR.UTF-8;LC_ADDRESS=fr_FR.UTF-8;LC_TELEPHONE=fr_FR.UTF-8;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=fr_FR.UTF-8"

The first call is asking for the LC_NUMERIC locale which is responsible for commas and dots in the float printing. The second call is asking for the LC_ALL locale which is a top-priority locale supposed to take precedence over every other one. I guess the output here means that the LC_ALL is not set yet, that is why it shows every other locale setting. The "C" value means that the float values will be printed with a dot. See https://unix.stackexchange.com/questions/87745/what-does-lc-all-c-do

I read that programs should generally call setlocale(LC_ALL, "") which sets the locale to the ENV variable (basically):

julia> unsafe_string( @ccall setlocale(6::Int, ""::Cstring)::Cstring )
"fr_FR.UTF-8"

julia> unsafe_string( @ccall setlocale(1::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"fr_FR.UTF-8"

julia> unsafe_string( @ccall setlocale(6::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"fr_FR.UTF-8"

We can see it did used the LC_ALL ENV variable, and it gets precedence over LC_NUMERIC, which nows answers french too.

Notice that by default Julia did change some locale, because C programs only use ENV variable if you call the C function setlocale. For example if I compile and run the following C program:

#include <stdio.h>
#include <locale.h>

int main()
{
    printf("%s\n", setlocale(LC_NUMERIC, NULL));
    printf("%s\n", setlocale(LC_MESSAGES, NULL));
    return 0;
}

I get the output:

C
C

But Julia on the other hand, set LC_MESSAGES to french, but kept LC_NUMERIC to C. I guess it was a choice, that seems good.

compare locale float printing

LC_ALL="fr_FR.UTF-8" julia

LC_NUMERIC is at "C":

julia> unsafe_string( @ccall setlocale(6::Int, ""::Cstring)::Cstring )
"C"

Float are printed with dots:

julia> @ccall printf("%f\n"::Cstring ; 3.333::Cfloat)::Cint
3.333000
9

Setting LC_NUMERIC to french and now floats are written with commas:

julia> unsafe_string( @ccall setlocale(1::Int, "fr_FR.UTF-8"::Cstring)::Cstring )
"fr_FR.UTF-8"

julia> @ccall printf("%f\n"::Cstring ; 3.333::Cfloat)::Cint
3,333000
9

Note that Julia's printf do not use LC_NUMERIC I think it always uses english or C notation.

PyPlot problem

I guess PyPlot calls something like setlocale(LC_ALL, "").

LC_ALL="fr_FR.UTF-8" julia

Compare this initial locale values (and float printings):

julia> unsafe_string( @ccall setlocale(1::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"C"

julia> unsafe_string( @ccall setlocale(6::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"LC_CTYPE=fr_FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr_FR.UTF-8;LC_COLLATE=fr_FR.UTF-8;LC_MONETARY=fr_FR.UTF-8;LC_MESSAGES=fr_FR.UTF-8;LC_PAPER=fr_FR.UTF-8;LC_NAME=fr_FR.UTF-8;LC_ADDRESS=fr_FR.UTF-8;LC_TELEPHONE=fr_FR.UTF-8;LC_MEASUREMENT=fr_FR.UTF-8;LC_IDENTIFICATION=fr_FR.UTF-8"

julia> @ccall printf("%f\n"::Cstring ; 3.333::Cfloat)::Cint
3.333000
9

with the ones after loading PyPlot in the session:

julia> using PyPlot

julia> unsafe_string( @ccall setlocale(1::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"fr_FR.UTF-8"

julia> unsafe_string( @ccall setlocale(6::Int, Cstring(Ptr{Int8}())::Cstring)::Cstring )
"fr_FR.UTF-8"

julia> @ccall printf("%f\n"::Cstring ; 3.333::Cfloat)::Cint
3,333000
9

Idea ?

Maybe by being a bit clever we can change the locale just before writing FITS and restore the locale to exactly what is was before the call, in a finally clause ?

EDIT: problem, if the locale is shared by C libraries and they use threads.. it is "not safe" to change the locale to C because maybe PyPlot had a good reason to put it to french.

emmt commented 1 year ago

Impressive tests!

buthanoid commented 7 months ago

CFITSIO Version 4.3.0 - Jul 2023 added the patch. did not tested yet but it is supposed to fix it :-)