SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.04k stars 139 forks source link

fix require stucture #304

Closed rickhull closed 7 years ago

rickhull commented 7 years ago
zverok commented 7 years ago

goal: files like daru/dataframe should be funtional on their own

Why? What's the idea behind it? Is it common for other libraries to have arbitrary parts of them to be requirable independently? What are downsides of just require "daru"?

Despite those questions, I like the idea of moving some "platform-specific" code to separate file, yet approach in this PR seems too "mechanic" for me. Daru::MONTH_DAYS and MISSING_VALUES and SPLIT_TOKEN aren't looking like "platform-specific" data for me.

rickhull commented 7 years ago

@zverok Thanks for the quick review and feedback!

Why? What's the idea behind it? Is it common for other libraries to have arbitrary parts of them to be requirable independently? What are downsides of just require "daru"?

For large gems, it is very much a feature for them to be "modular" at require-time. If all I need or want to use is a DataFrame, then I don't want to pull all of daru into my project. Yes, my project will still depend on "all" of daru (at the gem level), but there is benefit to minimizing the necessary requires:

  1. Ruby process starts quicker: spends less time doing unnecessary requires
  2. Smaller namespace: less chance for collisions or mutual incompatibilities, and things like pry or other IDE features work better with less searching and noise presented to user
  3. Hygiene: it's simply better engineering
  4. Documentation: a developer working on a file should have a clear understanding where all "symbols" (e.g. methods called like Daru.has_nyaplot?) are defined

On 3 & 4, why have any requires in daru/dataframe at all? Shouldn't this file require all necessary files that define the "symbols" it uses? Why would we require 'daru/io/io.rb' but not 'daru/platform' (if the "platform" functionality were already split from require-the-world in lib/daru.rb)?

Lastly, who wrote this comment and why?

I agree that some of the definitions in daru/platform don't obviously belong with others and could be split into a more specific file. I considered two files inside daru/core rather than daru/platform. My PR was made to be relatively simple and illustrative, its purpose more about communication than a serious candidate for merge. Also, while there are many gems that adhere to this philosophy of "modular" requires, there are many gems that ignore this. IMHO it's clearly a useful feature with no significant downsides and it signifies a well written gem.

I hope you like the idea and I'm glad to help more completely implement it, even if not via this particular PR as it stands now.

v0dro commented 7 years ago

I agree that gems should be modular and that one should be able to require only daru/dataframe if need be. This is a nice idea and one that should be merged into daru.

Just that as @zverok pointed out, maybe the daru.rb file can be split into files like daru/constants.rb that contains all the constants like DAYS_OF_WEEK and another file called daru/platform.rb that contains the rest of the code.

rickhull commented 7 years ago
vagrant@contrib-jessie:~/git/daru$ rake modular_require
   OK daru.rb
ERROR daru/date_time/index.rb
   OK daru/date_time/offsets.rb
   OK daru/io/io.rb
   OK daru/io/sql_data_source.rb
   OK daru/maths/arithmetic/dataframe.rb
   OK daru/maths/arithmetic/vector.rb
   OK daru/maths/statistics/dataframe.rb
   OK daru/maths/statistics/vector.rb
   OK daru/formatters/table.rb
   OK daru/iruby/helpers.rb
   OK daru/category.rb
   OK daru/monkeys.rb
   OK daru/accessors/mdarray_wrapper.rb
   OK daru/accessors/dataframe_by_row.rb
   OK daru/accessors/gsl_wrapper.rb
   OK daru/accessors/nmatrix_wrapper.rb
ERROR daru/accessors/array_wrapper.rb
   OK daru/extensions/rserve.rb
   OK daru/dataframe.rb
   OK daru/helpers/array.rb
   OK daru/platform.rb
   OK daru/index/index.rb
ERROR daru/index/multi_index.rb
ERROR daru/index/categorical_index.rb
   OK daru/vector.rb
   OK daru/plotting/nyaplot.rb
   OK daru/plotting/gruff/category.rb
   OK daru/plotting/gruff/dataframe.rb
   OK daru/plotting/gruff/vector.rb
   OK daru/plotting/nyaplot/category.rb
   OK daru/plotting/nyaplot/dataframe.rb
   OK daru/plotting/nyaplot/vector.rb
   OK daru/plotting/gruff.rb
   OK daru/exceptions.rb
   OK daru/version.rb
   OK daru/core/query.rb
   OK daru/core/merge.rb
   OK daru/core/group_by.rb

ERRORS
---
/home/vagrant/git/daru/lib/daru/date_time/index.rb:7:in `singleton class': uninitialized constant Daru::Offsets (NameError)
/home/vagrant/git/daru/lib/daru/accessors/array_wrapper.rb:6:in `<class:ArrayWrapper>': uninitialized constant Forwardable (NameError)
/home/vagrant/git/daru/lib/daru/index/multi_index.rb:2:in `<module:Daru>': uninitialized constant Daru::Index (NameError)
/home/vagrant/git/daru/lib/daru/index/categorical_index.rb:2:in `<module:Daru>': uninitialized constant Daru::Index (NameError)
rickhull commented 7 years ago

This current travis failure is also happening on master branch, so something is screwed up in the test infrastructure:

Failures:
  1) Daru::Vector#is_values single value to_a should eq [true, false, true, false, false]
     Failure/Error: its(:to_a) { is_expected.to eq [true, false, true, false, false] }

       expected: [true, false, true, false, false]
            got: [true, false, true, nil, nil]

       (compared using ==)

       Diff:
       @@ -1,2 +1,2 @@
       -[true, false, true, false, false]
       +[true, false, true, nil, nil]

     # ./spec/vector_spec.rb:1397:in `block (4 levels) in <top (required)>'
  2) Daru::Vector#is_values multiple values to_a should eq [true, false, true, true, true]
     Failure/Error: its(:to_a) { is_expected.to eq [true, false, true, true, true] }

       expected: [true, false, true, true, true]
            got: [true, false, true, nil, nil]

       (compared using ==)

       Diff:
       @@ -1,2 +1,2 @@
       -[true, false, true, true, true]
       +[true, false, true, nil, nil]

     # ./spec/vector_spec.rb:1403:in `block (4 levels) in <top (required)>'
rickhull commented 7 years ago
vagrant@contrib-jessie:~/git/daru$ rake modular_require
   OK daru.rb
   OK daru/date_time/index.rb
   OK daru/date_time/offsets.rb
   OK daru/io/io.rb
   OK daru/io/sql_data_source.rb
   OK daru/maths/arithmetic/dataframe.rb
   OK daru/maths/arithmetic/vector.rb
   OK daru/maths/statistics/dataframe.rb
   OK daru/maths/statistics/vector.rb
   OK daru/formatters/table.rb
   OK daru/iruby/helpers.rb
   OK daru/category.rb
   OK daru/monkeys.rb
   OK daru/accessors/mdarray_wrapper.rb
   OK daru/accessors/dataframe_by_row.rb
   OK daru/accessors/gsl_wrapper.rb
   OK daru/accessors/nmatrix_wrapper.rb
   OK daru/accessors/array_wrapper.rb
   OK daru/extensions/rserve.rb
   OK daru/dataframe.rb
   OK daru/helpers/array.rb
   OK daru/platform.rb
   OK daru/index/index.rb
   OK daru/index/multi_index.rb
   OK daru/index/categorical_index.rb
   OK daru/vector.rb
   OK daru/plotting/nyaplot.rb
   OK daru/plotting/gruff/category.rb
   OK daru/plotting/gruff/dataframe.rb
   OK daru/plotting/gruff/vector.rb
   OK daru/plotting/nyaplot/category.rb
   OK daru/plotting/nyaplot/dataframe.rb
   OK daru/plotting/nyaplot/vector.rb
   OK daru/plotting/gruff.rb
   OK daru/exceptions.rb
   OK daru/version.rb
   OK daru/core/query.rb
   OK daru/core/merge.rb
   OK daru/core/group_by.rb

vagrant@contrib-jessie:~/git/daru$ echo $?
0
rickhull commented 7 years ago
vagrant@contrib-jessie:~/git/daru$ bundle exec rspec spec --exclude spec/extensions/rserve_spec.rb
/home/vagrant/.gem/ruby/2.4.0/gems/gsl-2.1.0.2/lib/gsl/oper.rb:43: warning: constant ::Fixnum is deprecated
/home/vagrant/.gem/ruby/2.4.0/gems/backports-3.6.8/lib/backports/1.8.7/fixnum/div.rb:1: warning: constant ::Fixnum is deprecated
/home/vagrant/.gem/ruby/2.4.0/gems/backports-3.6.8/lib/backports/1.8.7/fixnum/fdiv.rb:1: warning: constant ::Fixnum is deprecated
/home/vagrant/.gem/ruby/2.4.0/gems/backports-3.6.8/lib/backports/2.1.0/bignum/bit_length.rb:1: warning: constant ::Bignum is deprecated
/home/vagrant/.gem/ruby/2.4.0/gems/backports-3.6.8/lib/backports/2.1.0/fixnum/bit_length.rb:1: warning: constant ::Fixnum is deprecated
/home/vagrant/.gem/ruby/2.4.0/gems/thread_safe-0.3.5/lib/thread_safe/cache.rb:155: warning: constant ::Fixnum is deprecated
/home/vagrant/.gem/ruby/2.4.0/gems/thread_safe-0.3.5/lib/thread_safe/cache.rb:155: warning: constant ::Fixnum is deprecated
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................*..............................*..................................................................................................................*.........................................................*..........................................................................................................................................................................................................................................................................................................................................*.........*....*...........................................*................../home/vagrant/.gem/ruby/2.4.0/gems/nmatrix-0.2.3/lib/nmatrix/monkeys.rb:51: warning: constant ::Fixnum is deprecated
.............................................................................................................*..............................................................**.........................................................................................................................................................................................................................................................................../home/vagrant/.gem/ruby/2.4.0/gems/ruby-ole-1.2.12/lib/ole/support.rb:201: warning: constant ::Fixnum is deprecated
./home/vagrant/.gem/ruby/2.4.0/gems/ruby-ole-1.2.12/lib/ole/support.rb:201: warning: constant ::Fixnum is deprecated
../home/vagrant/.gem/ruby/2.4.0/gems/thread_safe-0.3.5/lib/thread_safe/cache.rb:155: warning: constant ::Fixnum is deprecated
../home/vagrant/.gem/ruby/2.4.0/gems/thread_safe-0.3.5/lib/thread_safe/cache.rb:155: warning: constant ::Fixnum is deprecated
/home/vagrant/.gem/ruby/2.4.0/gems/thread_safe-0.3.5/lib/thread_safe/cache.rb:155: warning: constant ::Fixnum is deprecated
./home/vagrant/.gem/ruby/2.4.0/gems/thread_safe-0.3.5/lib/thread_safe/cache.rb:155: warning: constant ::Fixnum is deprecated
..*...........................................................................................................*...............................................................................................................................................................................................................................................................................................................................................................................................................*.........*..........................................................................................................................................................................................................................*.........*...........................................................................................................................................................................................................................*.........*..............................................................*..............................................................................................................

Pending: (Failures listed here are expected and do not affect your suite's status)

  1) Daru::Core::GroupBy#aggregate
     # Not yet implemented
     # ./spec/core/group_by_spec.rb:180

  2) Daru::Core::GroupBy#[]
     # Not yet implemented
     # ./spec/core/group_by_spec.rb:390

  3) Daru::DataFrame#initialize Daru::MultiIndex aligns MultiIndexes properly
     # No reason given
     Failure/Error: }, order: order_mi))

     NameError:
       undefined local variable or method `order_mi' for #<RSpec::ExampleGroups::DaruDataFrame_2::Initialize::DaruMultiIndex:0x007f3794e0c798>
       Did you mean?  @order_mi
     # ./spec/dataframe_spec.rb:418:in `block (4 levels) in <top (required)>'

  4) Daru::DataFrame#row[]= Daru::MultiIndex
     # Not yet implemented
     # ./spec/dataframe_spec.rb:778

  5) Daru::DataFrame#keep_row_if changing row from under the iterator trips this
     # Not yet implemented
     # ./spec/dataframe_spec.rb:2115

  6) Daru::DataFrame#to_a Daru::MultiIndex
     # Not yet implemented
     # ./spec/dataframe_spec.rb:2240

  7) Daru::DataFrame#sort Daru::MultiIndex
     # Not yet implemented
     # ./spec/dataframe_spec.rb:2297

  8) Daru::DataFrame#sort! Daru::MultiIndex
     # Not yet implemented
     # ./spec/dataframe_spec.rb:2515

  9) Daru::DateTimeIndex.initialize lets setting of string time format
     # No reason given
     Failure/Error: Daru::DateTimeIndex.format = 'some-date-time-format'

     NoMethodError:
       undefined method `format=' for Daru::DateTimeIndex:Class
     # ./spec/date_time/index_spec.rb:29:in `block (3 levels) in <top (required)>'

  10) Daru::DateTimeIndex#add single index to_a
     # No reason given
     # ./spec/date_time/index_spec.rb:490

  11) Daru::DateTimeIndex#add mulitple indexes to_a
     # No reason given
     # ./spec/date_time/index_spec.rb:497

  12) Daru::IO Daru::DataFrame.from_plaintext understands empty fields
     # Temporarily skipped with xit
     # ./spec/io/io_spec.rb:218

  13) Daru::Vector#+ appropriately adds vectors with numeric and non-numeric indexes
      # Need an alternate index implementation?
      Failure/Error: index = (@index.to_a | other.index.to_a).sort

      ArgumentError:
        comparison of Integer with :a failed
      # ./lib/daru/maths/arithmetic/vector.rb:67:in `sort'
      # ./lib/daru/maths/arithmetic/vector.rb:67:in `v2v_binary'
      # ./lib/daru/maths/arithmetic/vector.rb:54:in `binary_op'
      # ./lib/daru/maths/arithmetic/vector.rb:6:in `+'
      # ./spec/maths/arithmetic/vector_spec.rb:31:in `block (3 levels) in <top (required)>'

  14) Daru::Vector nmatrix #delete_at Daru::Index deletes element of specified integer index
      # No reason given
      Failure/Error: rows.map { |r| formatter % r }.join("\n")

      ArgumentError:
        too few arguments
      # ./lib/daru/formatters/table.rb:23:in `%'
      # ./lib/daru/formatters/table.rb:23:in `block in format'
      # ./lib/daru/formatters/table.rb:23:in `map'
      # ./lib/daru/formatters/table.rb:23:in `format'
      # ./lib/daru/formatters/table.rb:6:in `format'
      # ./lib/daru/vector.rb:929:in `inspect'
      # ./spec/vector_spec.rb:888:in `block (6 levels) in <top (required)>'

  15) Daru::Vector nmatrix #to_h Daru::MultiIndex
     # Not yet implemented
     # ./spec/vector_spec.rb:972

  16) Daru::Vector gsl #delete_at Daru::Index deletes element of specified integer index
      # No reason given
      Failure/Error: rows.map { |r| formatter % r }.join("\n")

      ArgumentError:
        too few arguments
      # ./lib/daru/formatters/table.rb:23:in `%'
      # ./lib/daru/formatters/table.rb:23:in `block in format'
      # ./lib/daru/formatters/table.rb:23:in `map'
      # ./lib/daru/formatters/table.rb:23:in `format'
      # ./lib/daru/formatters/table.rb:6:in `format'
      # ./lib/daru/vector.rb:929:in `inspect'
      # ./spec/vector_spec.rb:888:in `block (6 levels) in <top (required)>'

  17) Daru::Vector gsl #to_h Daru::MultiIndex
     # Not yet implemented
     # ./spec/vector_spec.rb:972

  18) Daru::Vector array #delete_at Daru::Index deletes element of specified integer index
      # No reason given
      Failure/Error: rows.map { |r| formatter % r }.join("\n")

      ArgumentError:
        too few arguments
      # ./lib/daru/formatters/table.rb:23:in `%'
      # ./lib/daru/formatters/table.rb:23:in `block in format'
      # ./lib/daru/formatters/table.rb:23:in `map'
      # ./lib/daru/formatters/table.rb:23:in `format'
      # ./lib/daru/formatters/table.rb:6:in `format'
      # ./lib/daru/vector.rb:929:in `inspect'
      # ./spec/vector_spec.rb:888:in `block (6 levels) in <top (required)>'

  19) Daru::Vector array #to_h Daru::MultiIndex
     # Not yet implemented
     # ./spec/vector_spec.rb:972

  20) Daru::Vector#clone_structure Daru::MultiIndex
     # Not yet implemented
     # ./spec/vector_spec.rb:1427

Finished in 4.31 seconds (files took 4.93 seconds to load)
2672 examples, 0 failures, 20 pending

Coverage report generated for RSpec to /home/vagrant/git/daru/coverage. 3686 / 3772 LOC (97.72%) covered.
rickhull commented 7 years ago

I'm having trouble understanding the current failure:

Failures:
  1) Daru::Vector#is_values single value to_a should eq [true, false, true, false, false]
     Failure/Error: its(:to_a) { is_expected.to eq [true, false, true, false, false] }

       expected: [true, false, true, false, false]
            got: [true, false, true, nil, nil]

       (compared using ==)

       Diff:
       @@ -1,2 +1,2 @@
       -[true, false, true, false, false]
       +[true, false, true, nil, nil]

     # ./spec/vector_spec.rb:1397:in `block (4 levels) in <top (required)>'
  2) Daru::Vector#is_values multiple values to_a should eq [true, false, true, true, true]
     Failure/Error: its(:to_a) { is_expected.to eq [true, false, true, true, true] }

       expected: [true, false, true, true, true]
            got: [true, false, true, nil, nil]

       (compared using ==)

       Diff:
       @@ -1,2 +1,2 @@
       -[true, false, true, true, true]
       +[true, false, true, nil, nil]

     # ./spec/vector_spec.rb:1403:in `block (4 levels) in <top (required)>'
Finished in 6.58 seconds (files took 3.51 seconds to load)
2677 examples, 2 failures, 20 pending
Failed examples:
[31mrspec ./spec/vector_spec.rb:1397 # Daru::Vector#is_values single value to_a should eq [true, false, true, false, false]
rspec ./spec/vector_spec.rb:1403 # Daru::Vector#is_values multiple values to_a should eq [true, false, true, true, true]

OK, so spec/vector_spec.rb lines 1397 and 1403 should be talking about "single values" and "multiple values" and #to_a. Let's look, nope:

# ...
    it "verifies missing data presence" do
      expect(@with_md.is_nil?)   .to eq(Daru::Vector.new([false,false,true,false,false,true]))
      expect(@without_md.is_nil?).to eq(Daru::Vector.new([false,false,false,false,false,false]))
    end
  end

  context "#not_nil?" do
    before(:each) do
      @with_md    = Daru::Vector.new([1,2,nil,3,4,nil])
      @without_md = Daru::Vector.new([1,2,3,4,5,6])
    end

    it "verifies missing data presence" do
      expect(@with_md.not_nil?)   .to eq(Daru::Vector.new([true,true,false,true,true,false]))
      expect(@without_md.not_nil?).to eq(Daru::Vector.new([true,true,true,true,true,true]))
    end
# ...

The error message complains about an array of bools length 5. The code in spec/vector_spec.rb expects an array of bools length 6. What is going on? So far as I can tell, this bizarre behavior is specific to Travis.

athityakumar commented 7 years ago

@rickhull : Nope, this build error is legit. It's just that commits 5868487aa60a4f3361a67768409240a890fe8826 and 8a4bb1cadc5d05600025f186d5c15aef74ebcdd1 were slightly incompatible. You are checking for the code in the correct file and branch, but in the wrong repository. You're looking in your fork, but the actual issue is in the main repository. (Your fork is behind the main master by 2 commits, check for the same file lines in main repo's master.)

Travis CI builds imagining that your PR has been merged into the main repository, and hence - you're getting build error as the main repository itself has build error. Wait for the main repo's build to succeed, and then you can check if your PR is giving any build error or not. :+1:

rickhull commented 7 years ago

Ah, thanks. I assumed Travis was testing my branch. Silly me. Makes sense, though it might be nice to see the result from the branch itself.

zverok commented 7 years ago

Hey guys so for missing a long discussion (I've been travelling to conference in Vilnius, and broken my notebook on the road), but now I have a LOT things to say :)

Let's start with this statement:

For large gems, it is very much a feature for them to be "modular" at require-time. If all I need or want to use is a DataFrame, then I don't want to pull all of daru into my project. Yes, my project will still depend on "all" of daru (at the gem level), but there is benefit to minimizing the necessary requires

I understand the theoretical idea, but how it applies to daru? To be honest, in my current perception, daru is not that big library. Trying to answer properly to this PR, I've analyzed how much of files/modules/classes we are loading, and it looks like the whole daru splits into:

In current state of the library, we have pretty reasonable approach (or trying to have it):

So, the changes I could find reasonable, would be:

  1. ability to require daru core (dataframe/vector/indices) without anything that is "not necessary" by require 'daru/core'
  2. ...while still having an ability to absentmindedly require 'daru' (alternative solution: like activesupport, require 'daru' requires only core, and require 'daru/all' requires everything);
  3. probably, an ability to specify which backends do you really need (e.g., even if GSL is installed, require 'daru/core' does not loads it, only require 'daru/gsl' does);
  4. the same with frontends.

WDYT?

rickhull commented 7 years ago

I agree with all of that, pretty much. It's very easily achievable if we're already abiding by the principle of modular requires, where any lib file may be required and expected to work.

Here is the ideal (IMHO) behavior, keeping in mind that a require target is always a file (presumed .rb) and never a directory:

# load the default system, if not "everything"
# composed of requires; nothing internal should require this
require 'daru' # lib/daru.rb

# load the dataframe class (Daru::DataFrame and anything it needs)
# composed of class and module definitions with requires at the top
require 'daru/dataframe' # lib/daru/dataframe.rb

# load the foo functionality (e.g. core, whatever)
# composed of class and module definitions with requires at the top
require 'daru/foo' # lib/daru/foo.rb

# load the gsl backend (?)
# composed of class and module definitions with requires at the top
# this probably loads much of daru, but only what it needs
# and this much has probably been loaded already, effectively NOOP
# beyond loading the gsl stuff
require 'daru/gsl' # lib/daru/gsl.rb
zverok commented 7 years ago

...the principle of modular requires, where any lib file may be required and expected to work.

I still don't see it as a useful principle in our case. This way, you'll have vector to require dataframe and index, and dataframe to require vector and index, and index to require vector and dataframe... Well, show me the library that works this way? E.g. any file can be required independently (and don't say activesupport, please).

Reasonable partition that I've described above is, well, reasonable. While making any daru/maths/arithmetic/vector independently requireable seems like an artificial goal to me.

zverok commented 7 years ago

(Sorry, I've responded to previous, shorter version of your comment... Though, changes don't change much for me.)

I'd suggest you trying to make a full map of "independent" parts of daru yourself, not limiting to theoretical discussion. E.g. what are some particular parts that should be requireable by themselves. All of 38 .rb files in lib, including something like lib/daru/accessors/dataframe_by_row.rb? What is the pragmatic reason to chase this goal?

rickhull commented 7 years ago

This way, you'll have vector to require dataframe and index, and dataframe to require vector and index, and index to require vector and dataframe...

It's perfectly fine for dataframe to require vector, assuming dataframes are composed of vectors. Vector probably shouldn't require dataframe, but if it needs to refer to dataframe definitions, then so be it. It's a smell, but they can each pull in the other without any problem.

Well, show me the library that works this way? E.g. any file can be required independently (and don't say activesupport, please).

Let me answer this from a slightly different angle. First, most ruby code has this property. You can require the file in e.g. irb and start playing with the class and/or module. It's only once projects start to get bigger that they can lose this property, whereby require $arbitrary_lib_file is allowed to fail (or otherwise misbehave). Now we're in the realm where things only "happen" to work and require order matters and mysterious load problems crop up from time to time. See the dataframe/nyaplot comment I deleted and referenced earlier for an example.

There are many gems which maintain this property, but I can't with certainty name any that do so, besides my own. Some from the chef universe come to mind but I'm not certain. I would venture that at least 10% and possibly more than 50% of the registered gems would pass the modular_require test

Reasonable partition that I've described above is, well, reasonable. While making any daru/maths/arithmetic/vector independently requireable seems like an artificial goal to me.

Perhaps. Think of me like an advocate of structured programming on a mission to eradicate GOTOs. While the use of GOTO is defensible in some cases, most of the time it's better not to. Likewise, there should be a justification for permitting a file to not require what it needs.

I'd suggest you trying to make a full map of "independent" parts of daru yourself, not limiting to theoretical discussion. E.g. what are some particular parts that should be requireable by themselves. All of 38 .rb files in lib, including something like lib/daru/accessors/dataframe_by_row.rb? What is the pragmatic reason to chase this goal?

Yep, this PR has already achieved the goal and rather easily. Daru was not far off. There were only 4 violations beyond dataframe, and I fixed them by adding require statements. Do you disagree with any of the code changes? All of this discussion is just to help explain why the changes are desirable.

I think we agree that dataframe had a problem that I identified and fixed. My larger point is that no part of Daru should be held to a lesser standard.

zverok commented 7 years ago

OK, let me me absolutely clear here.

First: currently, all the gems/libraries I've seen (and it was a lot), basically tend to follow the following principles:

  1. if we have some/path.rb and some/path/, requiring the former typically means requiring the latter (which contains all the classes and module some/path.rb needs to work);
  2. for most of the small and moderate-sized gems, you just do require 'gemname' and it is the only thing you typically want to do, and it requires all the gem's features, classes and modules;
  3. sometimes, (2) is replaced with autoloading of constants, e.g. "if user needs this constant, we will load this .rb file" (which AFAIK considered deprecated by ruby-core, yet has a huge impact in Rails land);
  4. sometimes, parts of the gem are not required by default when they are contextually-dependent, for example, when using webmock with rspec, you should do require 'webmock/rspec' explicitly (and it will also require 'webmock' implicitly, but also add some RSpec-specific features and configs).

Now, how this PR/discussions looks for me, please feel free to object:

  1. You made up your own "style rule", named "modular require", saying "each file should be requireable independently";
  2. You can't show any style guides/recommendations mentioning this rule;
  3. You can't show any gem/library following this rule, except for your own;
  4. ...Yet you consider it as obvious/mandatory as GOTO;
  5. ...And you want to make random OSS project (daru) to follow it immediately.

I am sorry, I am not sure things are working this way. ¯\_(ツ)_/¯

rickhull commented 7 years ago

I encourage you to examine whether the code changes are desirable, even if my rationale was not convincing. Also keep in mind my belief that most well written gems do adhere to this, even if unintentionally, though I have not taken the time to verify this.

zverok commented 7 years ago

I encourage you to examine whether the code changes are desirable, even if my rationale was not convincing.

We started the dialog about the theoretical principles exactly because they altered you way to made this PR in an undesirable (from my POV) manner. Let's go point-by-point:

Those are just small examples. All in all, as I've said from the very beginning, the proposed changes looks "mechanical" to me. They do NOT achieve any goal about "clearer structure", exactly the opposite: now any contributor will have a lot of questions about those "requires everywhere".

Also keep in mind my belief that most well written gems do adhere to this

Well, there are beliefs, and there are facts, okay? Do you mind to show some popular & "well-written" gem adhering to this rule? Or you'll say all popular gems are badly written? Maybe it is some kind of conspiracy?..

rickhull commented 7 years ago

As to the organization or coherence of daru/platform.rb, I feel it is best for a separate discussion. My intent was to avoid controversy by making the smallest possible change: simply moving all non-requires to a separate, arbitrarily named file. I figured platform was better than foo, and beyond that I have no opinion or investment as to further splitting and file naming.

As to MultiIndex requiring Index, that is completely appropriate in my eyes. I don't see the problem, and declaring the functionality that is depended on or referenced is the proper behavior. What is the benefit to omitting the require? To me, it is perfectly acceptable for a developer to do: pry -I lib -r daru/index/multi_index and start debugging or poking around. For this to work as expected, Multi Index must declare the functionality it depends on and references, via require.

Can you help me understand why it is desirable to omit the requires? Is it a rails idiom?

rickhull commented 7 years ago

Also, no conspiracies. I'm just trying to be careful and transparent about the strength of my claims. Either of us can spend the time to verify it with a small bit of scripting, as necessary. I'll work on a short list in the meantime.

Edit:

2 of the first 3 projects I reviewed appear to adhere:

I haven't checked them against modular_require yet

zverok commented 7 years ago

Can you help me understand why it is desirable to omit the requires? Is it a rails idiom?

I was never saying a word about "omitting the requires". I've said that independent requireability of each file is a phantom goal. The "normal" structure of the gem is either "require everything from the main file", or "tree require" (e.g. daru.rb requires daru/index.rb, which requires daru/index/multi.rb or something like that -- yet still it is NOT planned that somebody will require "daru/index", because it could also depend on Vector and DataFrame classes, yet do NOT tries to require them all).

The two main problem with your "everything should require everything it needs" are:

  1. Maintainability. It is like turning on some harsh rubocop rule: whether the code will became really better, or it would be just blindly following some rule for the sake of following the rule? Which leads us to:
  2. You are solving non-existent problem. The library is deliberately designed the way that you always have to use at least DataFrame, Vector and Index-es, there is absolutely no reason in the world to support imaginary scenarios "but what if library's user really, really wants to use only Index? and what if also he really, really hates that DataFrame class also will be available?"

The hard truth (at least in my eyes) is: your PR just made the code more complicated and messy, and for no real reason.

Also, no conspiracies.

Sorry, it was a (pretty distasteful) joke, I've should mind my language.

I'm just trying to be careful and transparent about the strength of my claims. Either of us can spend the time to verify it with a small bit of scripting, as necessary.

Well, I've asked you to show me some of the gems that follow your principle. Before that, I've checked several of well-known ones randomly (that I consider "good code"/popular gems): sequel, faraday, webmock, addressable, rspec. Of all those only addressable could be said to somehow being independently requireable, but it only consists of two files, having independent functionality.

Care to show examples of gems that adhere to your principle?

rickhull commented 7 years ago

I named these 2 in an edit: concurrent-ruby and functional-ruby

Just so we're clear, which of these do you disagree with?

  1. Making Data Frame independently requireable
  2. Adding the proper requires to array_wrapper and index files
  3. Else?
zverok commented 7 years ago

I named these 2 in an edit: concurrent-ruby and functional-ruby

Yes, and activesupport. And facets. All of them being not a "library", but rather a set of small libraries. ¯\_(ツ)_/¯

Just so we'really clear, which of these do you disagree with?

Neither. To be really clear. Neither of change in your PR I can see a problem solved, or how they make the library better.

rickhull commented 7 years ago

I can only assume you mean you disagree with both. If that's the case, then yeah, we're done. Cheers!

zverok commented 7 years ago

I can only assume you mean you disagree with both.

Well, I believe that I've spent like 7 really long and detailed comments to explain my view at the existing problems and their possible solutions (and why I think your PR's approach seems not the right direction for me). So, yeah, you can say I disagree.