exercism / ruby

Exercism exercises in Ruby.
https://exercism.org/tracks/ruby
MIT License
545 stars 512 forks source link

[Matrix] Thoughts: Do not encourage bloated `initialize` methods #962

Closed joshgoebel closed 5 years ago

joshgoebel commented 5 years ago

Could we have a discussion about this? I think typically this is bad behavior and it's much better design to do the work when methods are called. This pattern can lead to all sorts of performance problems latter (doing too much work). It's also a pretty serious violation of SRP.

Thoughts?

Originally filed against the matrix actual exercise: https://github.com/exercism/website-copy/pull/1051

[Edit maud] The current solution in the mentor notes is:

class Matrix
  attr_reader :rows, :columns
  def initialize(matrix_as_string)
    @rows = extract_rows(matrix_as_string)
    @columns = rows.transpose
  end

  private
  def extract_rows(string)
    string.each_line.map { |line| line.split.map(&:to_i) }
  end
end
emcoding commented 5 years ago

Regarding the Matrix exercise.

I wouldn't label it as bad behavior, but as an opinion. There has been discussions about this on several places for this exercise and here's my recap of those:

  1. Some people prefer to set the rows and colums in the initializer (like the current Mentor Notes example).
  2. Some people prefer to extract just the lines (@lines = matrix_as_string.lines), then rows and columns as methods.
  3. Some people prefer to set @matrix in the initializer, with methods for rows and columns, where the lines are extracted within the rows method.
  4. Some people prefer to set the @rows in the initializer, and then a column method.

My conclusion was that it also depends on how you define what a matrix is; if you consider rows and columns as an inherent characteristic of what a matrix is, then it does make sense to set them in the initializer. If you feel that the rows and columns of a matrix can live separate lives, then I can see that setting them together looks like a violation of SRP.

My impression(!) is that the Ruby community is moving towards doing more work in the initializer. I don't know where that's coming from, so if someone can confirm and/or enlighten us, that would be appreciated.

I'm not very into 'preformance' arguments, unless, of course, performance proves to be a problem. Which it isn't with the current tests.

I am very into readability, and that's why I do not like number 4. (In mentoring, I would explain why I do not like it, but not disapprove.)

The current Mentor Notes are not reflecting all the opinions listed above. If we find consensus on which are viable options, I volunteer to update the mentor notes accordingly.

@exercism/ruby

joshgoebel commented 5 years ago

My conclusion was that it also depends on how you define what a matrix is; if you consider rows and columns as an inherent characteristic of what a matrix is, then it does make sense to set them in the initializer. If you feel that the rows and columns of a matrix can live separate lives, then I can see that setting them together looks like a violation of SRP.

It sounds like you're only thinking of SRP at the class level. It also applies to individual methods/accessors - and making sure those also only have a single, clear responsibility. When you come at it from that perspective any init that does a ton of work just setting up accessors that are themselves calculated is a violation of SRP. You're putting a bunch of responsibilities into a single method.

I think memoizing everything in init is an anti-pattern... students learn this and then start to apply to it larger and large examples (scores, etc.) - in which case it only gets worse and worse.

I'm not very into 'preformance' arguments, unless, of course, performance proves to be a problem.

We shouldn't be teaching habits/patterns that far too easily lead to bad performance later though. Though performance is only half my argument here.

sshine commented 5 years ago

bloated initialize methods

I'll try to sell this: There's a general object-oriented idiom that teaches that constructors shouldn't do any computational work themselves. An example of this argument is found on Misko Hevery's Testability Explorer Blog: Flaw: Constructor does Real Work.

This conflicts with the fact that a matrix is given as a string, but the internal representation is column/row-based, which is also sensical. @yyyc514 argues that it's not the responsibility of the constructor to perform the conversion. On the other hand, it's inefficient if the object doesn't cache its representation of a matrix in a way that is efficient for its methods.

One solution is to have a static constructor (factory) called something like from_string that does the work of extract_... such that the real constructor only does assignment. I don't know what Ruby-specific naming idioms apply for static constructors / factories. They're quite common in Java and Haskell.

A solution might then look like:

class Matrix
  attr_reader :rows, :columns
  def initialize(rows, columns)
    @rows = rows
    @columns = columns
  end

  def self.from_string(matrix_as_string)
    rows = matrix_as_string.each_line.map { |line| line.split.map(&:to_i) }
    columns = rows.transpose
    Matrix.new(rows, columns)
  end
end

A secondary concern is whether to cache/memoize the column representation or compute it every time it's returned. And as @F3PiX voices, it depends. I'd elaborate that it depends not just on how you see the class, but how you expect to manipulate its objects. For example, if you expect the matrix to be mutable and that you want to change the values of cells, having the same cell in two places will become a problem. As the exercise is contrived in the sense that it only has rows and columns properties, and no actual use-cases, there's no right answer here.

It may be "preformance" optimization to memoize columns, or it may not.

So another solution might look like:

class Matrix
  attr_reader :rows
  def initialize(rows)
    @rows = rows
  end

  def columns
    rows.transpose
  end

  def self.from_string(matrix_as_string)
    rows = matrix_as_string.each_line.map { |line| line.split.map(&:to_i) }
    Matrix.new(rows)
  end
end

Either way, the concern of the constructor that does too much work is still valid.

If we keep the two concerns separate, perhaps we can reach a consensus faster. :-)

My impression(!) is that the Ruby community is moving towards doing more work in the initializer. I don't know where that's coming from, so if someone can confirm and/or enlighten us, that would be appreciated.

It's hard for me to confirm or deny this.

But we should have plenty of rubyists on the track available?

The current Mentor Notes are not reflecting all the opinions listed above. If we find consensus on which are viable options, I volunteer to update the mentor notes accordingly.

Since there's already a PR for this, exercism/website-copy#1051, perhaps this PR should just be updated with the changes around which this issue seeks to reach consensus?

joshgoebel commented 5 years ago

@yyyc514 argues that it's not the responsibility of the constructor to perform the conversion. On the other hand, it's inefficient if the object doesn't cache its representation of a matrix in a way that is efficient for its methods.

Memoize rows..

def rows
  @rows ||= calculate_rows
end

...or memoize matrix...

def rows
  internal_matrix
end

def internal_matrix
  @internal_matrix ||=  # parse input string
end

I tend to prefer the first, but both solve this issue.

joshgoebel commented 5 years ago

One solution is to have a static constructor (factory) called something like fromstring that does the work of extract... such that the real constructor only does assignment.

I don't think this is a great solution because it doesn't address the performance problem. One can easily imagine a matrix with a variety of methods and I just want to load a bunch of matrixes from a text file and then print out only the LARGE ones. Imagine we had a slightly more parseable format (I just realized it'd have to use newlines different, but you still get the idea). IE:

100x100;raw_matrix_data_here_in_previous_format_we_already_familiar_with"

Now imagine I have a 100 million matrixes on disk. I want to write code like this:

@matrixs = File.open("input")
  .each_line.map { |x| Matrix.new(x) }
  .select {|x| x.width == 100 && x.height == 100 }
  .map(&:print)

Or you could easily imagine writing it to use less RAM... the point is if there is only a single matrix that is 100x100... other than the disk IO and the cost of width and height (I presume I'd have to parse the first part of the raw string) I can mostly ignore the other 99 million matrixes. They don't need to be fully parsed, rows and columns don't need to be computed at all.

This is of course lazy loading (and not a new concept), but I think you get this for "free" when you keep init small and push responsibilities down into their own methods.

sshine commented 5 years ago

I wrote:

As the exercise is contrived in the sense that it only has rows and columns properties, and no actual use-cases, there's no right answer here.

@yyyc514 wrote:

[...] I think you get this for "free" when you [...] push responsibilities down into their own methods.

Your suggestion has the upside that parsing is delayed, but the downside that the internal representation of a matrix is not abstract and depends on a specific serialization format; cf. what Misko Hevery calls "an inflexible and prematurely coupled design". I won't expand on this argument, as his blog post does so amply.

My input was an offer to regard constructors that do real work, the title of your PR, as a separate concern than how to internally represent a matrix, which is a rabbit hole with no end, since the exercise does not specify any specific use-cases.

I just want to load [100 million] matrixes from a text file and then print out only the LARGE ones

This does not seem like an unusual thing one would want to do with 100 million matrices, but it does seem highly specific. How do you know that a matrix is LARGE without accessing it abstractly? If you know it by the size of the string, wouldn't you also know it by the file's size without even having to read its input? This, I bet you, would be even faster! This, of course, assumes that they're located in 100 million separate files with each their file size. But what if they're in the cloud in a NoSQL database? Then network becomes a bottleneck! And so on...

Principially, I do not believe you can find a convincing argument to support one hypothetical, highly specific use-case over another, unless you can either 1) apply a general idiom (like I attempted for the constructor), or 2) expand on the context or purpose of the exercise. Are you suggesting that the matrix exercise should cover principles of lazy evaluation in Ruby?

pgaspar commented 5 years ago

I'm enjoying this discussion so far - thank you!

My thinking around this is still a bit fuzzy and the result is that I end up not mentoring Matrix as much as other exercises.

I think memoizing everything in init is an anti-pattern... students learn this and then start to apply to it larger and large examples (scores, etc.) - in which case it only gets worse and worse.

In general I feel the same way @yyyc514 does here.

I'd elaborate that it depends not just on how you see the class, but how you expect to manipulate its objects. For example, if you expect the matrix to be mutable and that you want to change the values of cells, having the same cell in two places will become a problem.

@sshine's point here is one of the main reasons I feel this practice is potentially dangerous. It's hard to easily demonstrate the mutability issue on the core exercises where people do this often because the tests usually don't specify whether the objects should be mutable or not (thinking mainly of Matrix and HighScores).

I have a personal snippet attempting to explain this for HighScores that I use when I run into an all-in-initializer solution. I haven't used it often.

See it here. Do you see any potential problem of using this solution in the real world? My goal with this question is to make you think about why calculating everything in the constructor can be error prone for mutable objects. Consider this example: ```ruby high_scores = HighScores.new([3, 2, 1]) high_scores.scores << 100 p high_scores.scores # => [3, 2, 1, 100] p high_scores.personal_top_three # => 3 p high_scores.latest # => 1 ``` Do you see the problem? Both `personal_best` and `latest` should return 100. Now, it's true none of the tests alter the `scores` array after initialization, but it would very likely happen if this was a real world class, making this an important detail to be aware of 👍

@sshine says:

This conflicts with the fact that a matrix is given as a string, but the internal representation is column/row-based, which is also sensical. @yyyc514 argues that it's not the responsibility of the constructor to perform the conversion. [...]

One solution is to have a static constructor (factory) called something like from_string that does the work of extract_... such that the real constructor only does assignment.

I think this addresses my general uneasiness when mentoring this exercise 😌. Most solutions conflate the notion of representing a Matrix with the notion of parsing a specific serialization format, as @sshine offers below:

Your suggestion has the upside that parsing is delayed, but the downside that the internal representation of a matrix is not abstract and depends on a specific serialization format

Unfortunately (or not?) these exercises are just small enough that more sophisticated patterns can't easily emerge. Having a second serialization format could quickly lead to a very different architecture, clearly distinguishing deserialization from proper matrix representation.

joshgoebel commented 5 years ago

Your suggestion has the upside that parsing is delayed, but the downside that the internal representation of a matrix is not abstract and depends on a specific serialization format; cf. what Misko Hevery calls "an inflexible and prematurely coupled design". I won't expand on this argument, as his blog post does so amply.

I only skimmed, but not sure I follow this point. What I proposed doesn't require a "specific serialization format".... you could imagine many potential formats... and still delaying the parsing until the actual underlying data was actually requested - not just when the object was instantiated. I only gave a specific encoding format example to help explain how you could have fast access to some attributes of the matrix but not others.

My input was an offer to regard constructors that do real work, the title of your PR, as a separate concern than how to internally represent a matrix, which is a rabbit hole with no end, since the exercise does not specify any specific use-cases.

I didn't realize I was treating them as a single concern. The point is about avoiding work we don't need to do - regardless of what the internal representation looks like.

How do you know that a matrix is LARGE without accessing it abstractly? If you know it by the size of the string, wouldn't you also know it by the file's size without even having to read its input? This, I bet you, would be even faster! This, of course, assumes that they're located in 100 million separate files with each their file size.

These are also potential possibilities for rough idea of size, but I presume the best way to know the EXACT size of the matrix is to ask it itself - not try to guess based on other factors. So yes, you'd instantiate it and ask it directly. And you could imagine designs where that required parsing the whole encoded format - but you could also imagine designs where just fetching the metadata (width, height, etc) was a very fast operation... and in those cases filtering by meta-data would be very efficient as I suggested. UNLESS it got bogged down by an init that was doing way more than absolutely necessary.

I have a personal snippet attempting to explain this for HighScores that I use when I run into an all-in-initializer solution. I haven't used it often.

I'm going to borrow this or some variant. :-)

emcoding commented 5 years ago

I am listening to this discussion with a few questions in mind:

1) Are we discussing Better Programming practices, or Fluency in Ruby?
Because, Exercism's primary goal is about Fluency. That doesn't mean we can ignore Good Practices - but it could mean that an exercise is not suitable for the Fluency goal. Because, of course, we don't want to encourage Bad Practice. So, there is a limit to where we can focus on Fluency. My impression is that the concerns here are rather about Programming Best Practices. Am I correct, or do you feel this is specific Ruby?

2) As a mentor, I get more and more convinced that the solutions shouldn't introduce/handle/cover untested behaviour. And hence, that we shouldn't mentor about untested behaviour. As @sshine said: "As the exercise is contrived in the sense that it only has rows and columns properties, and no actual use-cases, there's no right answer here."
In that regard I don't agree with @pgaspar 's approach for HighScores: the mutability is deliberately left out of the tests (full disclosure: Pedro and me designed the exercise). And I'm convinced we should mentor for what we do test for. (It's hard enough to get that right.) This is a discussion that transcends this exercise. But it matters to me for trying to understand if the discussion is about something that is (deliberately) not meant to be discussed in this easy exercise. Or if the exercise is not suitable in the first place.

That said, the related exercise Ruby/Transpose, does cover use cases. See my next point.

3) Matrix is currently the first of the "mid-level exercises". (Series is the one before, let's consider that the closing of the 'first level'.) If we need to conclude that the current set up for the Matrix is inviting Bad Practices, my conclusion would be not to change the mentoring objectives. But to remove it from the core.

And, instead, add Ruby/Transpose to the core, but at a much later position on the track. WHere discussions about this kind of anti-patterns are more suitable. (Where the mid level exercises should focus on the libraries and the 'syntactical anti-patterns'. )

TL;DR My thinking is:

Thoughts?

sshine commented 5 years ago

@pgaspar wrote:

My thinking around this is still a bit fuzzy and the result is that I end up not mentoring Matrix as much as other exercises. [...] It's hard to easily demonstrate the mutability issue on the core exercises where people do this often because the tests usually don't specify whether the objects should be mutable or not (thinking mainly of Matrix and HighScores).

This seems like another concern tangentially related to the internal representation.

And it also seems like a very interesting point of concern to raise, since Matrix is a core exercise.

Both because you say it's not preferrable to mentor, and because it may simply be unclear.

@yyyc514 wrote:

Your suggestion has the upside [... lazy ...], but the downside [... not abstract ...]

I only skimmed, but not sure I follow this point. What I proposed doesn't require a "specific serialization format".... you could imagine many potential formats... and still delaying the parsing until the actual underlying data was actually requested [...]

OK, let me try and phrase it differently: There's a premature coupling between any serialization format and object initialization: in this case a string from a text file, which could be some other string format, or a maybe even a filename from which to open and read the matrix, or an open file descriptor to such a file, or a network file descriptor, or a callback function.

Any input to a constructor that does not correspond to the internal, abstract representation of the class creates a coupling between instantiating objects and deserializing them. If the deserialization function is separate from the constructor, you can construct objects abstractly.

This premature coupling violates, as you said yourself, the Single-Responsibility Principle because the constructor does not construct objects, but prepares them for construction, and the getters don't project objects, but deserializes them to some extent, and then projects them. It makes testing difficult, because you need to serialize a matrix before you can initialize it and test its properties.

I think your approach is ingenious in a highly specific use-case.

I didn't realize I was treating [real work in constructors and lazy initialization] as a single concern. The point is about avoiding work we don't need to do - regardless of what the internal representation looks like.

OK, then I misunderstood "bloated".

I understand now that you only speak of avoiding computation.

This is certainly a necessary technique sometimes. For example, when sending out mass emails at my day job, we cannot use our standard, abstract URL object, because its initialization and internal representation is simply too expensive for millions of URLs in a very short time period. So we (sadly) use string concatenation in those cases. In dynamic languages, abstraction does not come for free.

@F3PiX suggested:

  • remove Matrix as a core exercise, because it raises questions that are not appropriate at this point in the track
  • add Transpose to the core, but to the end of the track
  • Find one or more easy exercises that cover the concepts that are currently covered (or: were supposed to be covered) by Matrix with regard to the Track Anatomy.

OK; I can't tell what concepts are supposed to be covered, so I'll leave the conversation at this point and wish you luck. :-)

joshgoebel commented 5 years ago

Are we discussing Better Programming practices, or Fluency in Ruby? Because, Exercism's primary goal is about Fluency. ... My impression is that the concerns here are rather about Programming Best Practices. Am I correct, or do you feel this is specific Ruby?

No, I don't think it's specific to Ruby, but I mentor Ruby and it's very easy to properly handle this lazy style init stuff in Ruby so I think when there is an opportunity to teach it, it's valuable to. I have a hard time separating out fluency from best practices and my mentoring style probably focuses on both equally. I'm not completely sure they are so easy to separate either.

As a mentor, I get more and more convinced that the solutions shouldn't introduce/handle/cover untested behaviour. And hence, that we shouldn't mentor about untested behaviour.

Entirely disagree. A huge part of my mentoring (esp. on more complex exercises) are things that have nothing to do with tested/untested behavior:

Perhaps I've understood you too narrowly there though. There are often 100 terrible ways to "solve" an exercise if the ONLY goal is "tests are green" - solutions I'd never approve.

emcoding commented 5 years ago

I have a hard time separating out fluency

It's probably one of the harder parts of mentoring on Exercism, and also an important difference between mentoring on Exercism and mentoring/code reviews outside of Exercism. Katrina's post is an interesting read in this regard, https://github.com/exercism/docs/blob/master/about/goal-of-exercism.md . Of course, it's not always completely separatable. But, I still don't know if Matrix is at the right position in the track if we focus on the fluency part 🤷‍♂

@yyyc514 Do you happen to know a Ruby exercise where doing some work in the initializer is a good idea?
(Maybe it helps if we look at it from the opposite side.)

Entirely disagree. A huge part of my mentoring (esp. on more complex exercises) are things that have nothing to do with tested/untested behavior:

The things you name was not what I meant. I meant: introducing untested behaviour. Not things that are beyond testing. And I don't know where the 'only goal is tests are green' come from, but definitively not from me.

The relation to Matrix was the previous comment:

As @sshine said: "As the exercise is contrived in the sense that it only has rows and columns properties, and no actual use-cases, there's no right answer here."

If we don't know about further behaviour, my thinking start to go in the direction we shouldn't address 'overloading the initializer' at this point. If we all agree that it is a valuable point in itself, then it looks like it's better suited later in the track, AND with an exercise that really illustrates the problem and how to manage it.

joshgoebel commented 5 years ago

Do you happen to know a Ruby exercise where doing some work in the initializer is a good idea?

In the realm of Exercism exercises I can think of OTTOMH I think init should:

In general if 80% of your class's work is happening inside the initializer you're doing it all wrong. (ie, this Matrix example)

My background in Rails probably betrays me a little here, but I think even in Ruby proper these aren't bad principals. Objects should be fast to create because there is great utility in working with objects as abstractions... and if you create slow to initialize objects then what you do is end up pushing logic that really belongs inside the object OUTSIDE the object so you can "work with the object" BEFORE creating the object - because the cost of creation is too high - and that path is the path to the dark side.

emcoding commented 5 years ago

Let's try to wrap this conversation up.

My conclusion from the discussion is that the Matrix solution with everything defined in the initializer, is maybe not per se wrong in this particular exercise, but approving an approach that can have repercussions in other situations is misleading for the student.
Matrix is a Level 2 exercise, and the discussion that is provoked by computing the rows and the columns in the initializer, is not appropriate for this level. There are other problems with this exercise as well. For instance, Ruby already has a Matrix class, and we don't want to ask students to reimplement existing features (at least, not in core exercises).

All of the above leads to the conclusion that Matrix should be removed from the core progression. On the other hand, it offers some features that we badly need on Level 2:

At the moment, there are no other exercises (that I know of) that can fill the gap if we remove Matrix from the core. Also, Matrix is one of only 2 exercises on Level 2 (Series being the other one, following Matrix). Therefore, I propose to leave the things as they are. And try to find (or create) one or more other exercises that can fill the gap that is currently filled by Matrix. Albeit in a controversial way.

Given all this, can you, @yyyc514, or anyone else not live with this solution temporarily? If not, please explain why.

Also, suggestions for other exercises that can replace Matrix are very welcome.

TL;DR The current Matrix exercise is problematic in multiple ways. Let's find a replacement for the Matrix exercise as a core asap. In the meantime, approve both approaches: either calculating rows and columns in the initializer, or defining separate methods.

joshgoebel commented 5 years ago

I propose to leave the things as they are. And try to find (or create) one or more other exercises that can fill the gap that is currently filled by Matrix.

I'm ok with that. Can we go ahead and update the mentor notes to include the broken out solution as also valid/fine for this exercise as it currently stands then, ie:

class Matrix
  def initialize(matrix_as_string)
    @matrix_as_string = matrix_as_string
  end

  def rows
    @rows ||= matrix_as_string.each_line.map { |line| line.split.map(&:to_i) }
  end

  def columns
    @columns ||= rows.transpose
  end
end

If so I can make a PR with just that.

emcoding commented 5 years ago

If so I can make a PR with just that.

That would be great, thanks!

joshgoebel commented 5 years ago

Done: https://github.com/exercism/website-copy/pull/1149

emcoding commented 5 years ago

I'm closing this, as the PR is merged, and the follow up is recorded in #977

Thanks all for your input. And the PR 💃