Closed dietercastel closed 3 years ago
I have discussed with @nalimilan a bit about this issue. I think the cleanest way is to provide a conversion:
NamedArray
(probably columns to use for conversion should be passed and a sentinel for filling missing intersections)NamedArray
to NamedTuple
of vectorsThis will ensure minimal dependencies.
A more advanced strategy would be to make NamedArray
implement Tables.jl
interface without even requiring the conversions (so that it would be achievable without copying any data). Here, one would just have to check if plain NamedArray
would be OK to use or some thin wrapper around it would be needed.
Thanks for your addition/discussion.
I still haven't familiarized myself with the Tables.jl API so any pointers on that would be nice.
I have discussed with @nalimilan a bit about this issue. I think the cleanest way is to provide a conversion:
* from Tables.jl compliant type to `NamedArray` (probably columns to use for conversion should be passed and a sentinel for filling missing intersections)
1) I assume it must be a concrete type that implements the Tables API. My understanding was that DataFrame implements it, but deemed to heavy an dependency, what alternative (lighter?) concrete type do you propose?
2) besides convert
what would be a better name?
* from `NamedArray` to `NamedTuple` of vectors
3) You mean a conversion mechanisms specifically for NamedVectors to the (pretty novel?) Julia type NamedTuple? Isn't that another issue? Or how does this fit with the above discussion? If NamedTuple does implement the Table interface than that would be a candidate for the above. (1)
Finally when a Tables conversion is done, any implemented table writer would then solve the above. Should we test such conversion & writing in this package then?
concrete type do you propose?
The thing is that it does not to be concrete. If comething is Tables.jl compliant it should work.
If NamedTuple does implement the Table interface than that would be a candidate for the above.
Yes NamedTuple
of vectors is Tables.jl compliant table type - as it is in Base you do not have to import anything to have it as an output.
Should we test such conversion & writing in this package then?
Conversion to a NamedTuple
of vectors should be tested (if this path is chosen). Writing does not have to be tested.
What is the status of this work now?
I've looked quickly at Tables.jl, and to me it looks that this is specifically for 2D data, and I think for Symbox (or String) indices. NamedArrays is more general. I think there is a case for treating Symbol or String indices as a special situation in NamedArrays, this would also allow for more type stability in some of the harder functions like cat
.
Actually other AbstractArrays
started to support Tables.jl API by using key-value mapping. This would allow a more easy serialization.
I noticed the
DelimitedFiles.writedlm
writer strips all the name information. With #96 added (for conversion to DataFrame) and byusing CSV
it became pretty to improve on this.Tests included.
Writes this csv file: