SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

Is it expected that dataframe.row behaves differently for single and multiple index values? #376

Closed baarkerlounger closed 7 years ago

baarkerlounger commented 7 years ago

Dataframe.row gives your result as a new vector if you specify a single row index but as a dataframe if you provide multiple row indexes. Is that expected? Seems inconsistent?


[32] pry(Daru::Core::Query)> data_frame.row[0]
=> #<Daru::Vector(22)>
                                         0
                   ID              8517337
          Case Number             HV194652
                 Date 03/12/2012 02:00:00 
                Block   027XX S HAMLIN AVE
                 IUCR                 1152
         Primary Type   DECEPTIVE PRACTICE
          Description ILLEGAL USE CASH CAR
 Location Description ATM (AUTOMATIC TELLE
               Arrest                false
             Domestic                 true
                 Beat                 1031
             District                  010
                 Ward                   22
       Community Area                   30
             FBI Code                   11
                  ...                  ...
[33] pry(Daru::Core::Query)> data_frame.row[0,1]
=> #<Daru::DataFrame(2x22)>
                    ID Case Numbe       Date      Block       IUCR Primary Ty Descriptio Location D     Arrest   Domestic       Beat   District       Ward Community    FBI Code X Coordina Y Coordina       Year Updated On   Latitude  Longitude   Location
          0    8517337   HV194652 03/12/2012 027XX S HA       1152 DECEPTIVE  ILLEGAL US ATM (AUTOM      false       true       1031        010         22         30         11    1151482    1885517       2012 02/04/2016 41.8417380 -87.719605 (41.841738
          1    8517338   HV194241 03/06/2012 102XX S VE       0917 MOTOR VEHI CYCLE, SCO     STREET      false      false       0511        005          9         49         07    1181052    1837191       2012 02/04/2016 41.7084956 -87.612580 (41.708495```
zverok commented 7 years ago

Yes, it is deliberate inconsistency. Ruby's Array works exactly the same way:

a = (1..5).to_a
# => [1, 2, 3, 4, 5]
a[1]
# => 2
a[1,2]
# => [2, 3]

In simple functionality, we always try to follow Ruby's default collections behavior. For rubyists, it will produce least surprise, even despite the inconsistency.

gnilrets commented 7 years ago

@db579 - Daru a #to_df method you can use if you want to ensure that the result is always a dataframe. df[:a] will return a vector, but df[:a].to_df will return a dataframe with a single vector. Likewise df.to_df returns df, so if you are dynamically selecting vectors and there's a chance you may only select one, it's safe to add #to_df.