cjdoris / ARFFFiles.jl

Load and save ARFF files
MIT License
5 stars 2 forks source link

Unhandled `relational` attributes #17

Closed ferdiu closed 2 years ago

ferdiu commented 2 years ago

load function should be able to load ARFFs containing relational attributes as specified here.

cjdoris commented 2 years ago

Do you have an example file?

ferdiu commented 2 years ago

I found out the problem trying to load the file JapaneseVowels_TRAIN.arff of this dataset https://timeseriesclassification.com/description.php?Dataset=JapaneseVowels Sorry, I may misremember the correct name of the file.

Il sab 4 giu 2022, 19:30 Christopher Rowley @.***> ha scritto:

Do you have an example file?

— Reply to this email directly, view it on GitHub https://github.com/cjdoris/ARFFFiles.jl/issues/17#issuecomment-1146655626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETSK4YSO77KYDK3WYTLSGLVNOHCRANCNFSM5X3VGZZQ . You are receiving this because you authored the thread.Message ID: @.***>

cjdoris commented 2 years ago

Ok thanks, that looks doable.

cjdoris commented 2 years ago

Right this works on the main branch now (pkg> add ARFFFiles#main). Each element of a relational column is read as a table (an ARFFTable specifically). Let me know if you find any problems with it, and if not I'll make a release.

ferdiu commented 2 years ago

Thank you for implementing this so quickly.

It seems that load is doing its job handling relational attributes but the save function is complaining about not being able to handle Tables.DictColumnTable as eltype for a column:

julia> ARFFFiles.save("test_save.arff", df)
ERROR: ARFF does not support data of type Tables.DictColumnTable in column cepstrum_coefficient
Stacktrace:
 [1] save(io::IOStream, df::DataFrame; relation::String, comment::String)
   @ ARFFFiles ~/.julia/packages/ARFFFiles/alRHW/src/ARFFFiles.jl:1138
 [2] save
   @ ~/.julia/packages/ARFFFiles/alRHW/src/ARFFFiles.jl:1091 [inlined]
 [3] #25
   @ ~/.julia/packages/ARFFFiles/alRHW/src/ARFFFiles.jl:1147 [inlined]
 [4] open(::ARFFFiles.var"#25#26"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, DataFrame}, ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Base ./io.jl:330
 [5] open
   @ ./io.jl:328 [inlined]
 [6] #save#24
   @ ~/.julia/packages/ARFFFiles/alRHW/src/ARFFFiles.jl:1147 [inlined]
 [7] save(filename::String, df::DataFrame)
   @ ARFFFiles ~/.julia/packages/ARFFFiles/alRHW/src/ARFFFiles.jl:1147
 [8] top-level scope
   @ REPL[13]:1

this is the result of trying to save the same file mentioned earlier after loading it in memory.

As a plus I would probably convert the inner table to the same type the external one is. For instance, calling ARFFFiles.load(DataFrame, "JapaneseVowels_TRAIN.arff") now returns a two column DataFrame with the first column of type DictColumnTable (I think it should be DataFrame for consistency) and the second of type CategoricalValue{String, UInt32}. But I guess this could be just my preference rather than the way to go, it is up to you.

cjdoris commented 2 years ago

Yeah I didn't implement saving columns of tables as relational yet. It's tricky. I'd strongly recommend not saving data as ARFF anyway, use a more standard format.

I agree that it would be nice to recursively convert the inner tables too, but it breaks the existing API a bit. I'll think about it.

cjdoris commented 2 years ago

OK load(DataFrame, "some/file.arff") now recursively converts relational columns.

ferdiu commented 2 years ago

Nice job. Thank you.

I agree with you saying to save data in a more standard format but I would probably open another issue for it since someone may try to do it in the future.