TheoGuyard / LIBSVMdata.jl

A simple tool to fetch LIBSVM datasets to Julia
MIT License
2 stars 1 forks source link

Impossible to download the example dataset #7

Closed Vanadjy closed 1 month ago

Vanadjy commented 2 months ago

Hello, I have encountered a little problem while trying to download the dataset a1a given in the docstring example. I tried to run

julia> using LIBSVMdata

julia> using LIBSVM

julia> using Printf

julia> using Statistics

julia> Atrain, ytrain = load_dataset("a1a")

and I get the following output:

Downloading the dataset a1a...
* Couldn't find host www.csie.ntu.edu.tw in the (nil) file; using defaults
*   Trying 140.112.30.26:443...
* Connected to www.csie.ntu.edu.tw (140.112.30.26) port 443 (#0)
* schannel: disabled automatic use of client certificate
> GET /~cjlin/libsvmtools/datasets/binary\a1a HTTP/1.1
Host: www.csie.ntu.edu.tw
Accept: */*
User-Agent: curl/7.84.0 julia/1.8

* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Date: Fri, 09 Aug 2024 22:53:59 GMT
< Server: Apache/2.4.57 (Debian)
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Last-Modified: Mon, 25 Sep 2023 06:17:52 GMT
< ETag: "142-60628eb0a94cf"
< Accept-Ranges: bytes
< Content-Length: 322
< Content-Type: text/html
<
* Connection #0 to host www.csie.ntu.edu.tw left intact
ERROR: HTTP/1.1 404 Not Found while requesting https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary\a1a
Stacktrace:
 [1] #3
   @ C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Downloads\src\Downloads.jl:243 [inlined]    
 [2] open(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Nothing, String}, 
args::String; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:write, :lock), Tuple{Bool, Bool}}})     
   @ Base .\io.jl:384
 [3] #open_nolock#1
   @ C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\ArgTools\src\ArgTools.jl:35 [inlined]       
 [4] arg_write(f::Function, arg::String)
   @ ArgTools C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\ArgTools\src\ArgTools.jl:103       
 [5] #download#2
   @ C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Downloads\src\Downloads.jl:230 [inlined]
 [6] load_dataset(dataset::String; dense::Bool, replace::Bool, verbose::Bool)
   @ LIBSVMdata C:\Users\valen\.julia\packages\LIBSVMdata\EROan\src\LIBSVMdata.jl:116
 [7] load_dataset(dataset::String)
   @ LIBSVMdata C:\Users\valen\.julia\packages\LIBSVMdata\EROan\src\LIBSVMdata.jl:84
 [8] top-level scope
   @ REPL[36]:1

I suspect the issue comes from joinpath which builds the path to the dataset unproperly (building it with \ instead of / but I don't know how to fix this. Any idea about how can I solve this ? Any help would be really appreciated !

TheoGuyard commented 2 months ago

Hey @Vanadjy,

I think that the error comes from this line

mkpath(joinpath(homedir(), "data/libsvm"))

which puts a / in the path and fails when using Windows. I'm on it, I'll notice you when the patch is done !

Thanks for opening this issue 😁

TheoGuyard commented 2 months ago

@Vanadjy your problem should be fixed with the new version v0.1.2 of LIBSVMdata. Can you update the package version and check that everything is working on your side ?

Vanadjy commented 2 months ago

Thank you for the quick answer @TheoGuyard ! However even with the 0.1.2 version of the package it still doesn't work and I have the same error:

julia> Atrain, ytrain = load_dataset("a1a")
Downloading the dataset a1a...
* Couldn't find host www.csie.ntu.edu.tw in the (nil) file; using defaults
*   Trying 140.112.30.26:443...
* Connected to www.csie.ntu.edu.tw (140.112.30.26) port 443 (#0)
* schannel: disabled automatic use of client certificate
> GET /~cjlin/libsvmtools/datasets/binary\a1a HTTP/1.1
Host: www.csie.ntu.edu.tw
Accept: */*
User-Agent: curl/7.84.0 julia/1.8

* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Date: Mon, 12 Aug 2024 02:14:45 GMT
< Server: Apache/2.4.57 (Debian)
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Last-Modified: Mon, 25 Sep 2023 06:17:52 GMT
< ETag: "142-60628eb0a94cf"
< Accept-Ranges: bytes
< Content-Length: 322
< Content-Type: text/html
<
* Connection #0 to host www.csie.ntu.edu.tw left intact
ERROR: HTTP/1.1 404 Not Found while requesting https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary\a1a
Stacktrace:
 [1] #3
   @ C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Downloads\src\Downloads.jl:243 [inlined]
 [2] open(f::Downloads.var"#3#4"{Nothing, Vector{Pair{String, String}}, Float64, Nothing, Bool, Nothing, Nothing, String}, args::String; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:write, :lock), Tuple{Bool, Bool}}})
   @ Base .\io.jl:384
 [3] #open_nolock#1
   @ C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\ArgTools\src\ArgTools.jl:35 [inlined]
 [4] arg_write(f::Function, arg::String)
   @ ArgTools C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\ArgTools\src\ArgTools.jl:103
 [5] #download#2
   @ C:\Users\valen\AppData\Local\Programs\Julia-1.8.3\share\julia\stdlib\v1.8\Downloads\src\Downloads.jl:230 [inlined]
 [6] load_dataset(dataset::String; dense::Bool, replace::Bool, verbose::Bool)
   @ LIBSVMdata C:\Users\valen\.julia\packages\LIBSVMdata\HPB53\src\LIBSVMdata.jl:116
 [7] load_dataset(dataset::String)
   @ LIBSVMdata C:\Users\valen\.julia\packages\LIBSVMdata\HPB53\src\LIBSVMdata.jl:84
 [8] top-level scope
   @ REPL[38]:1
TheoGuyard commented 2 months ago

Ok I see now the URL is broken. I've fixed it but the LIBSVM website is currently broken. When it will be back working, I will run tests on windows to check that everything is working and I'll notice you !

TheoGuyard commented 2 months ago

@Vanadjy I've fixed the URL issue. All the CI tests pass on both Linux and Windows so your problem should be resolved with the v0.1.3 of the package.

TheoGuyard commented 1 month ago

LIBSVMdata.jl is now working on Windows.