SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

Empty DataFrame creation #392

Closed parthm closed 6 years ago

parthm commented 6 years ago

Is there a way of creating an empty DataFrame? Not sure if my understanding of DataFrame is limited here. If not, it may be good to create an empty DataFrame with DataFrame.new. The use case for this is when you want to create an empty DataFrame and add columns from multiple data sources. Something like below

df = DataFrame.new
sources.each do |col, csv|
  df[col] = Daru::DataFrame.from_csv(csv)['source_col']
end

Currently DataFrame.new fails with the below error:

irb(main):003:0> df = Daru::DataFrame.new
ArgumentError: wrong number of arguments (given 0, expected 1..2)
        from /home/parth/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daru-0.1.5/lib/daru/dataframe.rb:242:in `initialize'

At the moment I am creating a DataFrame with a dummy column and then adding new columns. The dummy columns is deleted after this. Not sure if there is a better way.

Python pandas equivalent will be:

>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>>
zverok commented 6 years ago

TBH, this example:

df = DataFrame.new
sources.each do |col, csv|
  df[col] = Daru::DataFrame.from_csv(csv)['source_col']
end

...doesn't looks like a pretties Ruby code possible to me; and I am not sure that creation of row-less and column-less DataFrame is really semantically meaningful... Though, creation of empty dataframes should be definitely simplified.

parthm commented 6 years ago

@zverok , yeah, probably the code can be better. I was merely using that as a use case of why we might need an empty DataFrame.

Looking into the source code, I realized that empty DataFrame is already supported. The approach is:

irb(main):003:0> df = Daru::DataFrame.new({})
=> limited output
 #<Daru::DataFrame(0x0)>

irb(main):004:0> df[:foo] = [1, 2, 3]
=> limited output
 [1, 2, 3]
irb(main):005:0> df
=> limited output
 #<Daru::DataFrame(3x1)>
     foo
   0   1
   1   2
   2   3

It would be a nice enhancement to have Daru::DataFrame.new be equivalent to Daru::DataFrame.new({}). Seems more discoverable. Thoughts?

zverok commented 6 years ago

Aknowledged.