davidavdav / NamedArrays.jl

Julia type that implements a drop-in replacement of Array with named dimensions
Other
120 stars 20 forks source link

Added an improved CSV writer. #99

Closed dietercastel closed 3 years ago

dietercastel commented 4 years ago

I noticed the DelimitedFiles.writedlm writer strips all the name information. With #96 added (for conversion to DataFrame) and by using CSV it became pretty to improve on this.

Tests included.

using NamedArrays
m4d = NamedArray(reshape(1:16,2,2,2,2))
setnames!(m4d,["A","B"],1)
setnames!(m4d,["x","y"],2)
setnames!(m4d,["11","22"],3)
setnames!(m4d,["DO","RE"],4)

NamedArrays.write("testm4d.csv",m4d)

Writes this csv file:

A,B,C,D,Values
A,x,11,DO,1
B,x,11,DO,2
A,y,11,DO,3
B,y,11,DO,4
A,x,22,DO,5
B,x,22,DO,6
A,y,22,DO,7
B,y,22,DO,8
A,x,11,RE,9
B,x,11,RE,10
A,y,11,RE,11
B,y,11,RE,12
A,x,22,RE,13
B,x,22,RE,14
A,y,22,RE,15
B,y,22,RE,16
bkamins commented 4 years ago

I have discussed with @nalimilan a bit about this issue. I think the cleanest way is to provide a conversion:

This will ensure minimal dependencies.

A more advanced strategy would be to make NamedArray implement Tables.jl interface without even requiring the conversions (so that it would be achievable without copying any data). Here, one would just have to check if plain NamedArray would be OK to use or some thin wrapper around it would be needed.

dietercastel commented 4 years ago

Thanks for your addition/discussion.

I still haven't familiarized myself with the Tables.jl API so any pointers on that would be nice.

I have discussed with @nalimilan a bit about this issue. I think the cleanest way is to provide a conversion:

* from Tables.jl compliant type to `NamedArray` (probably columns to use for conversion should be passed and a sentinel for filling missing intersections)

1) I assume it must be a concrete type that implements the Tables API. My understanding was that DataFrame implements it, but deemed to heavy an dependency, what alternative (lighter?) concrete type do you propose?

2) besides convert what would be a better name?

* from `NamedArray` to `NamedTuple` of vectors

3) You mean a conversion mechanisms specifically for NamedVectors to the (pretty novel?) Julia type NamedTuple? Isn't that another issue? Or how does this fit with the above discussion? If NamedTuple does implement the Table interface than that would be a candidate for the above. (1)

Finally when a Tables conversion is done, any implemented table writer would then solve the above. Should we test such conversion & writing in this package then?

bkamins commented 4 years ago

concrete type do you propose?

The thing is that it does not to be concrete. If comething is Tables.jl compliant it should work.

If NamedTuple does implement the Table interface than that would be a candidate for the above.

Yes NamedTuple of vectors is Tables.jl compliant table type - as it is in Base you do not have to import anything to have it as an output.

Should we test such conversion & writing in this package then?

Conversion to a NamedTuple of vectors should be tested (if this path is chosen). Writing does not have to be tested.

davidavdav commented 4 years ago

What is the status of this work now?

I've looked quickly at Tables.jl, and to me it looks that this is specifically for 2D data, and I think for Symbox (or String) indices. NamedArrays is more general. I think there is a case for treating Symbol or String indices as a special situation in NamedArrays, this would also allow for more type stability in some of the harder functions like cat.

bkamins commented 4 years ago

Actually other AbstractArrays started to support Tables.jl API by using key-value mapping. This would allow a more easy serialization.