Closed athityakumar closed 7 years ago
No idea either :) Never used Avro myself. Let's do parallel investigation of the matter and write here what we'll found?
Sure. From the examples mentioned here and here, I understand that an avro contains Schema (ie, class of datatype like String
/ Integer
/ ...) of columns.
df.use_avro
method would make more sense rather than from_avro
. We can use this method to convert the values (their Class) in an existing DataFrame
.df = Daru::DataFrame.new(name: %w[Dany Jon Tyrion], age: %w[35 30 40])
df[:age].to_a
#=> ["35", "30", "40"]
df.use_avro('path/to/avro/file') #! Avro schema contains name: String, age: Integer
df[:age].to_a
#=> [35, 30, 40]
name: String, age: Integer
. However, what if some columns have more than one type of values - raise TypeError
? Like, df = Daru::DataFrame.new(name: %w[Dany Jon Tyrion], age: [35, nil, 40]) #! nil, because data isn't available (say)
df.to_avro('path/to/avro/file')
#=> TypeError: Column 'age' contains values of different classes - FixNum & NilClass.
From the examples mentioned here and here, I understand that an avro contains Schema (ie, class of datatype like String / Integer / ...) of columns.
I am not sure this is right. Just definition and examples in Wikipedia I believe .avro
files contain schema AND data.
And this page contains some multi-megabyte example datasets, I doubt it is just a schema ;)
My bad, really sorry. I went through the above links and YES - avro does indeed contain both Schema & Data. I was unable to find any examples that contain data (previously). But I now recently had a look at this gem of a link, and you're quite right. Thanks a lot! I'll soon start working on this. 😄
P.S - It wasn't about not finding fixture files that contain data. Infact, all avro files do contain data. It was just the methods that would reveal the data, that I wasn't able to find from the avro gem until just recently.
Avro Importer is quite sorted out now. 😄
Regarding Avro Exporter, I think that the schema details should be provided from the user. But can we attempt (or maybe for later?) in 'guessing' the schema details (like, :type
, :name
and :fields
) from the Daru::DataFrame
? Or would this be too unreliable / unnecessarily hacky?
@zverok - I'm planning to work on the Avro Importer (and Exporter) next week and would like to have some clarity regarding both - Avro Importer & Avro Exporter. (I haven't used much of Avro, so please pardon the n00b questions 😉 )
Avro Importer : What exactly is intended to be imported? As far as I know from googling,
.avro
files contain only schemas and not data, right? Then, should a DataFrame of schemas be created or should data be given separately?Avro Exporter : Similarly, should just the DataFrame vectors be exported to an
.avro
file?