JuliaML / MLDatasets.jl

Utility package for accessing common Machine Learning datasets in Julia
https://juliaml.github.io/MLDatasets.jl/stable
MIT License
229 stars 46 forks source link

OrganicMaterialsDB fails to download. #206

Open jmmshn opened 1 year ago

jmmshn commented 1 year ago

Hi, I'm interested in using OrganicMaterialsDB but I'm running into the following error when I try to download the data. I tried digging into this a bit myselfPlease let me know if I'm doing something silly.

The exception seems to be triggered here: https://github.com/JuliaLang/julia/blob/3b76b25b648cc1fa6187b118c685819a4111a5d2/base/env.jl#L161-L166

Does this mean I'm suppose to set some ENV variable that I forgot to? Any help would be greatly appreciated! Thanks in advance!

julia> MLDatasets.OrganicMaterialsDB()
This program has requested access to the data dependency OrganicMaterialsDB.
which is not currently installed. It can be installed automatically, and you will not see this message again.

Dataset : The Organic Materials Database (OMDB)
Website : https://omdb.mathub.io/dataset

Do you want to download the dataset from Any[] to "/Users/shen9/.julia/datadeps/OrganicMaterialsDB"?
[y/n]
y
ERROR: MethodError: no method matching xor()
Closest candidates are:
  xor(::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at int.jl:333
  xor(::Bool, ::Bool) at bool.jl:71
  xor(::Bool, ::Missing) at missing.jl:171
  ...
Stacktrace:
  [1] _broadcast_getindex_evalf
    @ ./broadcast.jl:648 [inlined]
  [2] _broadcast_getindex
    @ ./broadcast.jl:621 [inlined]
  [3] getindex
    @ ./broadcast.jl:575 [inlined]
  [4] copy
    @ ./broadcast.jl:898 [inlined]
  [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(xor), Tuple{}})
    @ Base.Broadcast ./broadcast.jl:883
  [6] checksum(hasher::Function, filenames::Vector{Any})
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/verification.jl:90
  [7] hexchecksum(hasher::Function, filename::Vector{Any})
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/verification.jl:92
  [8] run_checksum(hasher::Function, path::Vector{Any})
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/verification.jl:43
  [9] run_checksum
    @ ~/.julia/packages/DataDeps/ae6dT/src/verification.jl:65 [inlined]
 [10] checksum_pass(hash::Nothing, fetched_path::Vector{Any})
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/resolution_automatic.jl:142
 [11] download(datadep::DataDeps.DataDep{Nothing, Vector{Any}, typeof(DataDeps.fetch_default), typeof(identity)}, localdir::String; remotepath::Vector{Any}, i_accept_the_terms_of_use::Nothing, skip_checksum::Bool)
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/resolution_automatic.jl:79
 [12] download
    @ ~/.julia/packages/DataDeps/ae6dT/src/resolution_automatic.jl:70 [inlined]
 [13] handle_missing
    @ ~/.julia/packages/DataDeps/ae6dT/src/resolution_automatic.jl:10 [inlined]
 [14] _resolve(datadep::DataDeps.DataDep{Nothing, Vector{Any}, typeof(DataDeps.fetch_default), typeof(identity)}, calling_filepath::String)
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/resolution.jl:83
 [15] resolve(datadep::DataDeps.DataDep{Nothing, Vector{Any}, typeof(DataDeps.fetch_default), typeof(identity)}, inner_filepath::String, calling_filepath::String)
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/resolution.jl:29
 [16] resolve(datadep_name::String, inner_filepath::String, calling_filepath::String)
    @ DataDeps ~/.julia/packages/DataDeps/ae6dT/src/resolution.jl:54
 [17] resolve
    @ ~/.julia/packages/DataDeps/ae6dT/src/resolution.jl:73 [inlined]
 [18] #13
    @ ~/.julia/packages/MLDatasets/bg0uc/src/download.jl:14 [inlined]
 [19] withenv(f::MLDatasets.var"#13#14"{String, Nothing}, keyvals::Pair{String, String})
    @ Base ./env.jl:161
 [20] with_accept
    @ ~/.julia/packages/MLDatasets/bg0uc/src/download.jl:7 [inlined]
 [21] #datadir#12
    @ ~/.julia/packages/MLDatasets/bg0uc/src/download.jl:11 [inlined]
 [22] datadir
    @ ~/.julia/packages/MLDatasets/bg0uc/src/download.jl:11 [inlined]
 [23] process_data_if_needed(::Type{OrganicMaterialsDB}; dir::Nothing)
    @ MLDatasets ~/.julia/packages/MLDatasets/bg0uc/src/datasets/graphs/organicmaterialsdb.jl:55
 [24] OrganicMaterialsDB(; split::Symbol, dir::Nothing)
    @ MLDatasets ~/.julia/packages/MLDatasets/bg0uc/src/datasets/graphs/organicmaterialsdb.jl:42
 [25] OrganicMaterialsDB()
    @ MLDatasets ~/.julia/packages/MLDatasets/bg0uc/src/datasets/graphs/organicmaterialsdb.jl:41
 [26] top-level scope
    @ REPL[8]:1
Dsantra92 commented 1 year ago

@CarloLucibello should we use ManualDataDeps instead of DataDeps here? Looks like the empty list for download can be the cause of this issue.