TacHawkes / HITRAN.jl

Spectrum calculation using the HITRAN database for Julia
MIT License
7 stars 0 forks source link

No data fetched for CH4? #17

Closed jtravs closed 2 years ago

jtravs commented 2 years ago

If I run:

fetch!("test", [32], 0, 12000, :standard);

(or other variations, such as using iso_id("CH4")) no data appears to be stored in the database table "test". I checked this by both trying to calculate the absorption coefficient, and also by using an sqlite browser. If I run the same with O2, e.g.

 fetch!("test", [36], 0, 12000, :standard);

Then data is saved to the database.

What is odd, is that if I directly download the data using the URL returned by

HITRAN.build_request_url!(32, 0, 12000, HITRAN.merge_groups(:standard))

then data is returned.

To try and understand what is happening, I also checked the temporary data files you use to store the data from the URL before loading it into the database, and these remain empty for the CH4 parameters. So the data is being lost somewhere?

TacHawkes commented 2 years ago

I am having a look at the moment and I can confirm that something is strange here. Actually the issue seems to be during download.

jtravs commented 2 years ago

That seems correct. If I set throw=true on line 259 of database.jl

    response = Downloads.request(
        url;
        output=tmp_file,
        progress=verbose ? print_progress : nothing, 
        throw=true        
    )

I get:

julia> fetch!("test", [32], 0, 12000, :standard)
[ Info: No custom HITRAN database specified, opening 'HITRAN.sqlite' (default)
ERROR: HTTP/1.1 200 OK (transfer closed with 309863 bytes remaining to read) while requesting https://hitran.org/lbl/api?iso_ids_list=32&numin=0.00&numax=12000.00&fixwidth=0&sep=[comma]&request_params=global_iso_id,trans_id,molec_id,local_iso_id,nu,sw,a,elower,gamma_air,delta_air,gamma_self,n_air,n_self,gp,gpp
Stacktrace:
  [1] (::Downloads.var"#9#18"{IOStream, Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Bool, String, Int64, Bool, Bool})(easy::Downloads.Curl.Easy)
    @ Downloads C:\Users\jt52\AppData\Local\Programs\Julia-1.7.0\share\julia\stdlib\v1.7\Downloads\src\Downloads.jl:369
  [2] with_handle(f::Downloads.var"#9#18"{IOStream, Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Bool, String, Int64, Bool, Bool}, handle::Downloads.Curl.Easy)
    @ Downloads.Curl C:\Users\jt52\AppData\Local\Programs\Julia-1.7.0\share\julia\stdlib\v1.7\Downloads\src\Curl\Curl.jl:64
  [3] #8
    @ C:\Users\jt52\AppData\Local\Programs\Julia-1.7.0\share\julia\stdlib\v1.7\Downloads\src\Downloads.jl:311 [inlined]
  [4] open(f::Downloads.var"#8#17"{Base.DevNull, Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Bool, String, Int64, Bool, Bool}, args::String; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:write,), Tuple{Bool}}})
    @ Base .\io.jl:330
...
TacHawkes commented 2 years ago

I have narrowed it down. The problem seems to be that the HITRAN server announces more data than it actually sends (you can also see this in your error report). At least that is what it seems, could also be a problem with Downloads.jl. I will have a more in-depth look at the problem. The ugly fix would be to ignore the length mismatch...

TacHawkes commented 2 years ago

Could you try the current master branch? I had a look at the details:

Therefore I have used the easy hook construct for Downloads.jl to modify the Curl options to ignore the invalid Content Length. It works for me now for your example cases. Could you confirm this?

You can use the force option, to force a re-download even if the data has already been downloaded.

fetch!("test", [32], 0, 12000, :standard; force=true)
jtravs commented 2 years ago

That appears to work better, in that it can download data.

However, for some requests, such as

fetch!("test3", [32], 0, 12000, :standard)

I get this error:

ERROR: HTTP/1.1 200 OK (Operation too slow. Less than 1 bytes/sec transferred the last 20 seconds) while requesting https://hitran.org/lbl/api?iso_ids_list=32&numin=0.00&numax=12000.00&fixwidth=0&sep=[comma]&request_params=global_iso_id,trans_id,molec_id,local_iso_id,nu,sw,a,elower,gamma_air,delta_air,gamma_self,n_air,n_self,gp,gpp
Stacktrace:
  [1] (::Downloads.var"#9#18"{IOStream, Base.DevNull, Nothing, Vector{Pair{String, String}}, Int64, Downloads.var"#24#27"{typeof(HITRAN.print_progress)}, Bool, Nothing, Bool, String, Bool, Bool})(easy::Downloads.Curl.Easy)
    @ Downloads C:\Users\jt52\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Downloads\src\Downloads.jl:387
  [2] with_handle(f::Downloads.var"#9#18"{IOStream, Base.DevNull, Nothing, Vector{Pair{String, String}}, Int64, Downloads.var"#24#27"{typeof(HITRAN.print_progress)}, Bool, Nothing, Bool, String, Bool, Bool}, handle::Downloads.Curl.Easy)
    @ Downloads.Curl C:\Users\jt52\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Downloads\src\Curl\Curl.jl:88
  [3] #8
    @ C:\Users\jt52\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Downloads\src\Downloads.jl:328 [inlined]
  [4] open(f::Downloads.var"#8#17"{Base.DevNull, Nothing, Vector{Pair{String, String}}, Int64, Downloads.var"#24#27"{typeof(HITRAN.print_progress)}, Bool, Nothing, Bool, String, Bool, Bool}, args::String; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:write,), Tuple{Bool}}})
    @ Base .\io.jl:330
  [5] arg_write(f::Function, arg::String)
    @ ArgTools C:\Users\jt52\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\ArgTools\src\ArgTools.jl:86
  [6] #7
    @ C:\Users\jt52\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Downloads\src\Downloads.jl:327 [inlined]
  [7] arg_read
    @ C:\Users\jt52\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\ArgTools\src\ArgTools.jl:61 [inlined]
  [8] request(url::String; input::Nothing, output::String, method::Nothing, headers::Vector{Pair{String, String}}, timeout::Int64, progress::typeof(HITRAN.print_progress), verbose::Bool, debug::Nothing, throw::Bool, downloader::Downloads.Downloader)
    @ Downloads C:\Users\jt52\AppData\Local\Programs\Julia-1.7.3\share\julia\stdlib\v1.7\Downloads\src\Downloads.jl:326
  [9] download_HITRAN(url::String, parameters::Vector{String}; verbose::Bool)
    @ HITRAN C:\Users\jt52\.julia\dev\HITRAN\src\database.jl:264
 [10] download_HITRAN
    @ C:\Users\jt52\.julia\dev\HITRAN\src\database.jl:253 [inlined]
 [11] fetch!(db::SQLite.DB, name::String, global_ids::Vector{Int64}, ν_min::Int64, ν_max::Int64, parameters::Vector{String}; force::Bool)
    @ HITRAN C:\Users\jt52\.julia\dev\HITRAN\src\database.jl:74
 [12] fetch!(db::SQLite.DB, name::String, global_ids::Vector{Int64}, ν_min::Int64, ν_max::Int64, parameters::Symbol; force::Bool)
    @ HITRAN C:\Users\jt52\.julia\dev\HITRAN\src\database.jl:124
 [13] fetch!(name::String, global_ids::Vector{Int64}, ν_min::Int64, ν_max::Int64, parameters::Symbol; force::Bool)
    @ HITRAN C:\Users\jt52\.julia\dev\HITRAN\src\database.jl:133
 [14] fetch!(name::String, global_ids::Vector{Int64}, ν_min::Int64, ν_max::Int64, parameters::Symbol)
    @ HITRAN C:\Users\jt52\.julia\dev\HITRAN\src\database.jl:133
 [15] top-level scope
    @ REPL[8]:1

Whereas, using Firefox I can download the file without difficulty. Maybe there is some kind of time out option we need to change?

jtravs commented 2 years ago

Adding

Downloads.Curl.setopt(easy, Downloads.Curl.CURLOPT_LOW_SPEED_TIME, 100)

on line 262 of database.jl fixed this, and it downloads just fine now after a small pause. I guess the hitran server takes longer when there are more lines to download.

I also checked that the downloaded file contents was the same size as that downloaded directly using the URL and they agreed.

jtravs commented 2 years ago

Note that I also needed to increase the timeout to 100 s for some requests too, but then everything works well.

TacHawkes commented 2 years ago

Thanks! I will have a look and integrate that. I hope HITRAN comes out with their APIv2 soon... However API keys will be mandatory for that version...

TacHawkes commented 2 years ago

I will mark this as fixed for now and tag a new release. Let me know if there are other issues...