Split big methods into smaller methods

c410-f3r commented 7 years ago

Aside from a good documentation, a function should also be split if it is too big (±30 lines). Smaller functions are easier to understand, maintain and are also more reusable and self-documented. This issue is related to #83 and I will illustrate some possible approaches. Any suggestions are welcome.

Source code example

impl Trait for Struct {
    fn example0(&self) { ... very long source code ... }
    fn example1(&self) { ... very long source code ... }
}

First approach

Create smaller static methods within a corresponding struct.

Pros: Simple Cons: Puts everything in the same file

impl Struct {
    /// Documentation
    fn do_this() { ... }

    /// Documentation
    fn do_that() { ... }

    /// Documentation
    fn fill_whatever() { ... }
}

impl Trait for Struct {
    fn example0(&self) {
        Struct::do_this();
        Struct::do_that();
    }

    fn example1(&self) {
        Struct::fill_whatever();
        Struct::do_that();
    }
}

Second approach

A method can have all its logic grouped in a separate module. Would be nice to have a common contract (Trait) that each separate module should implement.

Pros: More intuitive? Cons: Complex, files name must be sync.

Structure

module
├── module_helper
│      ├── mod.rs
│      └── example0.rs
│      └── example1.rs
└── mod.rs

module/mod.rs

pub mod module_helper;

import self::module_helper;
import self::module_helper::{example0, example1};

impl Trait for Struct {
    fn example0(&self) {
        example0::do_this();
        module_helper::do_that();
    }

    fn example1(&self) {
        example1::fill_whatever();
        module_helper::do_that();
    }
}

There is no silver bullet to this problem but I will be very grateful if we all come to a conclusion about this whole thing.

AtheMathmo commented 7 years ago

Thanks for writing this up - I agree that this is something that we should address in the library. That said, for me it isn't the biggest issue on the table right now. Do you have some examples within the library where you think this is a problem?

I think it is somewhat inevitable in a linear algebra library to have some large functions. And moreover separating the function out just to achieve this goal can make the problem worse as it becomes more difficult to follow the logic (you have to chase functions around). For me the solution is clean, self-documenting (and line-commented) code. Of course, there are some cases where we probably do have some quite bloated functions.

The bigger issue for me, which is definitely related to this, is that we have a few very large modules. I find it hard to find things in them and I wrote the majority of them - I imagine for new contributors it is hell. In particular I'm talking about the impl_ops and slice modules. I want to split these out in much the same way that @Andlon sorted out the decomposition module.

tafia commented 7 years ago

Like @AtheMathmo I believe this is not the biggest issue for now. On the other hand I do not think a linalg is more prone to bigger function than any other library.

I agree that having a concrete example could help.

Andlon commented 7 years ago

I haven't come across too many big functions in rulinalg so far. The exceptions are some of the numerical algorithms, but in these cases the code often mimics an algorithmic description, so it might not be desirable to split it up too much. However, as @AtheMathmo pointed out, some modules are very bloated, and this is something that should definitely be remedied.

Now, I think perhaps you're asking for advice on how to structure your work on #83 rather than a suggestion to make changes in the existing code base. In that case, I will try to give you some (hopefully useful) feedback on your suggestions.

The first approach

Remember that Rust allows you to have free functions. There's actually no reason to wrap functions in a Struct for this purpose. For example, you could do (horrible example, sorry. Obviously the pizza should be much more composable):

/// Feel free to document, but if it's a very obvious helper function,
/// this may not be necessary
fn make_dough() { unimplemented!() }
fn make_tomato_sauce() { unimplemented!() }
fn make_favorite_topping() { unimplemented!() }
fn assemble_pizza(crust: Crust, sauce: Sauce, topping: Topping) { unimplemented!() }

impl Dish for Pizza {
    fn prepare(oven: &Oven) -> Self {
         let dough = make_dough();
         let sauce = make_tomato_sauce();
         let crust = oven.prebake(dough);
         let topping = make_favorite_topping();
         let unbaked_pizza = assemble_pizza(crust, sauce, topping);
         oven.bake(unbaked_pizza)
    }
}

Second approach

Well, this is closer to what I would do, but it's perhaps a little over-the-top. Let's go back to the Pizza example, but let's assume that we want to accommodate combinations of all kinds of toppings. You could organize it like this:

pizza
├── pizza.rs
├── toppings.rs
└── mod.rs

In the above, mod.rs only imports the necessary types from pizza.rs and toppings.rs. The API of the pizza module is defined in mod.rs, but no actual implementation is done here. Rather, implementations are logically grouped depending on where they make the most sense, so that you should be able to find the appropriate types quickly just by looking at the filenames. You wouldn't expect to find anything about "dough" in topping.rs, would you? So it's probably in pizza.rs.

Ultimately, the two approaches are not mutually exclusive, and you should probably use both, depending on whatever is more appropriate.

My (personal) guidelines for dividing things into individual (internal) modules can be summarized as follows:

Separate concepts should be in separate files
For related things, start with a single file and break up as necessary
mod.rs should not contain implementations or definitions, but rather export the user-facing API. You can think of it as "transforming" the internal file organization into the API that the user sees.
If you find yourself naming modules or functions "helper", it might indicate that you're not setting the logical boundaries in the right places (however, there are many exceptions to this)

Especially the third point is not obeyed by many parts of rulinalg, and this is of course fair since I'm just stating my personal preferences here. That said, I do wish we could move more towards this pattern in the long run.

c410-f3r commented 7 years ago

Thanks everybody, especially @Andlon. I guess that I have enough information to "standardize" my next PRs. I look forward for formal code guidelines but this issue is too specific to such discuss, then @AtheMathmo can close it.

AtheMathmo commented 7 years ago

I agree - some additions to the contributions doc would be a good idea!

AtheMathmo / rulinalg