davidavdav / NamedArrays.jl

Julia type that implements a drop-in replacement of Array with named dimensions
Other
118 stars 20 forks source link

Reading from file to NamedArray #109

Open Mastomaki opened 3 years ago

Mastomaki commented 3 years ago

Documentation should be added about the best ways to read from text file to NamedArray. My current plan is to first read a DataFrame via CSV.jl and then convert it using the function provided by dietercastel.

Mastomaki commented 3 years ago

For missing values I edit the original function of dietercastel as follows:

function convert(t::Type{NamedArray}, df::DataFrame; valueCol = :Values)
   newdimnames = propertynames(df)
   deleteat!(newdimnames,findfirst(x->x==valueCol,newdimnames))
   names = map(dn->unique(df[!,dn]),newdimnames)
   lengths = map(length,names)

    newna = NamedArray( Array{Union{Missing, Float64}}(missing, lengths...), tuple(names...), tuple(newdimnames...))
    for row in eachrow(df)
        a = [row[col] for col in newdimnames]
        newna[a...] = row[valueCol]
    end
   return newna
end

However, the datatype of the named array should be set according to the original dataframe.

davidavdav commented 3 years ago

Yes, documentation. I have to study how that works. Do you know of a recommended and hosted platform for that?

Mastomaki commented 3 years ago

Not really. I believe the documentation of registered packages appears in https://juliapackages.com/ if it is present in the Github repository. And documenter.jl can be used to make documentation.

sciencepeak commented 3 years ago

Yes, documentation. I have to study how that works. Do you know of a recommended and hosted platform for that?

I think it is not necessary to master documenter.jl to write a formal, perfect documentation. If the usage of conversion between NamedArray and DataFrame can be added to the ReadMe file of this repository, that is good enough for now for people to learn it.

I think your package is very important for Julia to attract data science users from Python Pandas and R, where data frame and matrix can be easily converted to each other and transposed without losing row names or column names. Thanks a lot for your work.