JuliaClimate / CDSAPI.jl

Julia API to the Climate Data Store (a.k.a. CDS)
https://cds.climate.copernicus.eu
MIT License
23 stars 6 forks source link

JSON.parse in py2ju is not able to parse strings containing single quotes #17

Closed LakshyaKhatri closed 4 years ago

LakshyaKhatri commented 4 years ago

This works fine:

julia> str = """{
               "format": "zip",
               "variable": "surface_air_temperature",
               "product_type": "climatology",
               "month": "08",
               "origin": "era_interim"
           }""";

julia> py2ju(str)
Dict{String,Any} with 5 entries:
  "format"       => "zip"
  "month"        => "08"
  "product_type" => "climatology"
  "variable"     => "surface_air_temperature"
  "origin"       => "era_interim"

But the CDS website gives python dictionary with single quotes and on passing that to py2ju() it produces the following error:

julia> str = """{
               'format': 'zip',
               'variable': 'surface_air_temperature',
               'product_type': 'climatology',
               'month': '08',
               'origin': 'era_interim'
           }""";

julia> py2ju(str)
ERROR: Invalid object key
Line: 1
Around: ...{     'format': 'zip',     ...
                 ^

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] _error(::String, ::JSON.Parser.MemoryParserState) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:142
 [3] parse_object(::JSON.Parser.ParserContext{Dict{String,Any},Int64,true,nothing}, ::JSON.Parser.MemoryParserState) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:227
 [4] parse_value(::JSON.Parser.ParserContext{Dict{String,Any},Int64,true,nothing}, ::JSON.Parser.MemoryParserState) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:168
 [5] parse(::String; dicttype::Type{T} where T, inttype::Type{Int64}, allownan::Bool, null::Nothing) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:463
 [6] parse at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:461 [inlined]
 [7] py2ju(::String) at /home/aries/.julia/dev/CDSAPI/src/CDSAPI.jl:41
 [8] top-level scope at REPL[16]:1
 [9] eval(::Module, ::Any) at ./boot.jl:331
 [10] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [11] run_backend(::REPL.REPLBackend) at /home/aries/.julia/packages/Revise/BqeJF/src/Revise.jl:1184
juliohm commented 4 years ago

Thank you for checking. Do you mind submitting a PR with a fix ?

On Tue, Jun 30, 2020, 08:02 Lakshya Khatri notifications@github.com wrote:

This works fine:

julia> str = """{ "format": "zip", "variable": "surface_air_temperature", "product_type": "climatology", "month": "08", "origin": "era_interim" }""";

julia> py2ju(str) Dict{String,Any} with 5 entries: "format" => "zip" "month" => "08" "product_type" => "climatology" "variable" => "surface_air_temperature" "origin" => "era_interim"

But the CDS website gives python dictionary with single quotes and on passing that to py2ju() it produces the following error:

julia> str = """{ 'format': 'zip', 'variable': 'surface_air_temperature', 'product_type': 'climatology', 'month': '08', 'origin': 'era_interim' }""";

julia> py2ju(str) ERROR: Invalid object key Line: 1 Around: ...{ 'format': 'zip', ... ^

Stacktrace: [1] error(::String) at ./error.jl:33 [2] _error(::String, ::JSON.Parser.MemoryParserState) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:142 [3] parse_object(::JSON.Parser.ParserContext{Dict{String,Any},Int64,true,nothing}, ::JSON.Parser.MemoryParserState) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:227 [4] parse_value(::JSON.Parser.ParserContext{Dict{String,Any},Int64,true,nothing}, ::JSON.Parser.MemoryParserState) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:168 [5] parse(::String; dicttype::Type{T} where T, inttype::Type{Int64}, allownan::Bool, null::Nothing) at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:463 [6] parse at /home/aries/.julia/packages/JSON/d89fA/src/Parser.jl:461 [inlined] [7] py2ju(::String) at /home/aries/.julia/dev/CDSAPI/src/CDSAPI.jl:41

[9] eval(::Module, ::Any) at ./boot.jl:331 [10] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86 [11] run_backend(::REPL.REPLBackend) at /home/aries/.julia/packages/Revise/BqeJF/src/Revise.jl:1184

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaClimate/CDSAPI.jl/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3M2EKHCJL3U23CJXRDRZHA2ZANCNFSM4OMDUDEQ .

LakshyaKhatri commented 4 years ago

Yes. I will fix this and #16 by today (and possibly #12 too).

michiboo commented 4 years ago

@LakshyaKhatri Should we move back to implement our parse function again? I start to think maybe regex is good for this case.

LakshyaKhatri commented 4 years ago

@michiboo, I think we should give a try to macros: Python dict/JSON to Julia dict. What do you say?

juliohm commented 4 years ago

I think we should implement the parse function again, perhaps without regex. Also please avoid macros as much as possible. They are overused in languages out there.

LakshyaKhatri commented 4 years ago

This version works, (have to handle edge cases):

function py2ju(dictstr::AbstractString)
    try
        return JSON.parse(dictstr)
    catch
        dictstr = replace(dictstr, "'" => "\"")
        return JSON.parse(dictstr)
    end
end

I am working on other implementations but they are almost 10 to 20 times slower.

juliohm commented 4 years ago

I don't think we need to super speed here as this operations will happen once, and most of the time will be spent on data transfer. I would just keep it simple and always replace the ' as you did before calling JSON.parse.