Refactoring Orphan & Co.

NikitaAvvakumov commented 10 years ago

I ignored my own advice and started refactoring the Orphan model. It is very likely that these early efforts will have to either wait for a long time before being integrated into the OSRA code base, or be completely discarded and started anew once the churn subsides. However, having spent just a little time analyzing the situation, I think that the sooner we start thinking about the issues in our project’s architecture, the smoother any future refactoring will go. We might even start writing better code right away :smiley:

I am sharing this for two reasons: First, I do not claim that my opinions are universally correct and would appreciate input from the rest of the team. Second, the refactoring of the Orphan model alone has the potential to turn into a huge job, and the possibilities to split it across multiple PRs may be limited due to many dependencies (that alone is a good sign that refactoring is overdue). So, getting everyone involved in the process will hopefully make future code reviews less painful.

Extract Father from Orphan

I began with the very obviously needed task of extracting orphan_father into its own model. After initially directly copying all father-related attributes from Orphan, I replaced father_alive & father_is_martyr with enums status & martyr_status. I shared an article on enums a while back, and here’s the API link - they provide a simplified state machine-like functionality to Rails models and are, in many/most cases, a much better fit than boolean attributes. For instance, they automatically provide predicate and bang methods as well as scopes:

class Father < ActiveRecord::Base
  enum status: %i(dead alive)
  enum martyr_status: %i(not_martyr martyr)
end

father = Father.create(attr_hash)

father.alive! # => sets status to alive
father.alive? # => true
father.dead? # => false
father.status # => “alive”

Father.alive # => ActiveRecord_Relation of all Father instances with status “alive”

Enums

Using enums got me thinking about the many other places in the code base they can be used. Particularly, I wonder if we can eliminate all Status models and replace them with enums in their current associations. Not only would this have performance benefits (less db access, for example), it also makes sense from the architectural perspective (to me, at least): these objects are created once (seeded) and are immutable; they have no business logic of their own; they are, essentially, strings: “Active”, “Sponsored” etc.; association of an object with a particular status represents the object's state.

Importer

Once the Orphan & Father models are separated, the importer will break. To restore its functionality, changes will be needed in at least 2 places: OrphanImporter.to_orphan & collection_action :import within app/admin/pending_orphan_list.rb. Again, this is a good indicator that refactoring is called for.

I propose that creating and persisting valid model instances is the job of neither OrphanImporter nor PendingOrphanList but of a separate class. This new class would only report success or failure to the others, and it would be then up to the importer to decide whether to proceed with the import, and up to the PendingOrphanList to either call a “persistor” or destroy itself. This separation of concerns not only seems more logical to me, it will also allow for modification of only a single class if the Orphan model undergoes further changes in the future.

Pending models

Thinking about the importer, I could not explain to myself why PendingOrphanList & PendingOrphan are database models. Under no circumstances do we want them to be persisted (as in ‘long-term’). They come into existence during the import process and are supposed to disappear at its end, whether it is successful or not. PendingOrphanList provides the import UI at the moment, but even if we want ActiveAdmin to handle things, we may be able to move away from ActiveRecord. PendingOrphan has no UI and no logic of its own. Could it not be a non-db class? A hash of its instances could be held in memory during the import process and either converted into Orphan objects on success or discarded on failure.

Save the best for last

I replaced array join with string interpolation in Orphan#full_name. String interpolation is more efficient - my tests show that after the method gets called 1,000,000 times, we will have saved ~0.3-0.4s over the original implementation. No need for applause.

PurityControl commented 9 years ago

I agree with most of the things here and don't want to give the impression this is a negative post. However, as is the nature of such a response I will only be really focusing on the things I don't necessarily agree with so forgive me in advance.

I am not sure I agree that the PendingOrphanList and PendingOrphan are not models. The data is stored in the database as an intermediate step because there is more than one request response cycle between uploading the data in the spreadsheet and the client accepting the import. Storing in the database saves parsing the spreadsheet more than once. Using the activerecord library for this is a natural fit for this purpose.

http://guides.rubyonrails.org/active_record_basics.html states

<<<<<< 1 What is Active Record? Active Record is the M in MVC - the model - which is the layer of the system responsible for representing business data and logic. Active Record facilitates the creation and use of business objects whose data requires persistent storage to a database. It is an implementation of the Active Record pattern which itself is a description of an Object Relational Mapping system. <<<<<<

This storage of data may be relatively short lived but they are important business entities that the client needs to complete a workflow.

In the (on hold) pr in progress we have used PendingOrphan to create the Orphan and to be honest if it is not PendingOrphan's business to create an Orphan I don't know whose it is.

You suggest using a persistor, which is used in other frameworks, but in rails our classes our responsible for saving their own data through the inheriting of ActiveRecord::Base. I think mixing these patterns within a single project would cause too much confusion.

Lastly, however difficult it is to refactor over multiple pr's, I think we should try very hard to do this. Even if it takes more effort in the long run. Taking time to refactor in small pieces and learning the skills required to do this is an essential part of making refactoring a continued effort rather than a one hit wonder.

NikitaAvvakumov commented 9 years ago

I agree fully about trying to have granular PRs, especially for refactoring. At the moment, I am struggling to find a balance between a PR that's small enough to be easily reviewed, but encompasses enough functionality not to break the application. Extracting the Father model breaks the importer, and I see no way around fixing it before merging any code. I'll submit a WIP PR in the hopes of getting feedback that can resolve this.

Pending<Class> are, indeed, models. My question was directed at the difference between databased-backed ActiveRecord models and non-persisted ActiveModels. They are the same with the exception of the db aspect.

I cannot work out sufficient separation of concerns to prevent a cascade of changes every time we alter e.g. Orphan. The best I can think of is that we can try to mitigate it by making the structure as granular as possible. Currently, both the importer and the Orphan model check for presence of mandatory attributes, ensure data type matches, and ‘Male’/‘Female’ in case of gender. PendingOrphan alone handles extraction of data for Orphan and Address, as well as the persisting of those objects. Now we are talking about adding Father into the mix. This is why I’m thinking about splitting data parsing, validating and persisting as much as possible.

AgileVentures / osra