JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

Is it possible to append a dictionary to feather file #114

Closed dnk8n closed 5 years ago

dnk8n commented 5 years ago

Something along these lines:

using Feather

open("$(tmp_dir)/test.feather", "w") do feather_file
    for pool in pool_list
        open("$(tmp_dir)/$(pool).fakeq", "r") do fakeq_file
            lines = readlines(fakeq_file)
            for line in lines
                identifier, sample_name, sequence = split(line, "_")
                sequence_length = length(sequence)

                d = Dict(
                    :identifier => identifier,
                    :sample_name => sample_name,
                    :sequence => sequence,
                    :sequence_length => sequence_length
                )
                Feather.write(d, feather_file, true)
            end
        end
    end
end

I would like the values of the Dict in the code above to be able to take the form of variable length strings and variable length arrays of mixed types.

Is there potentially a more adequate file format for my needs? I was looking at HDF5 before this (In Julia, how can one stream line by line of contents to an HDF5 file?) but also ran into difficulties.

All of these formats seem to want you to dump an entire dataframe and do not provide tools to stream line by line (I might be mistaken, but I am failing to find an appropriate tool). For now I am stuck using multiple tab delimited files.

ExpandingMan commented 5 years ago

The feather format simply does not support that, you should consider using JLD.jl for serialization of arbitrary Julia objects and compatibility accross versions. You can also use HDF5.jl directly or the built-in serializer (though be warned, objects serialized that way are not guaranteed to be compaitble accross Julia versions).

A trick that works with Feather (although I only recommend it if your data is mostly tabular) is serializing JSONs into strings in a feather table. This is an inefficient but useful trick which I occasionally use for arrays of integers, but if you have floats, higher rank arrays or complicated nested dicts, I recommend one of the other formats I mentinoed.

ExpandingMan commented 5 years ago

I'm also going to close this as we don't define the Feather format, so implementing dicts is not really something we can do here. If you're looking for more advice on serializing your data, consider posting on the julia discourse.