goodybag / cater-api-server

hacking this together as fast as possible but with an eye for re-organization
1 stars 0 forks source link

consider db tables helper instead of models #15

Closed lalitkapoor closed 10 years ago

lalitkapoor commented 11 years ago

I'm wondering whether we really need models or just some kind of generic table helpers to do simple variations on select (find, findOne, findById, etc) and validation on inserts/updates.

With models I know we can implement things like: categoryModelInstance.getItems etc, but I'm just wondering whether we want that kind of functionality or will database table helpers address a majority of all queries.

I've used models in the past with django and it was great, but I think I've come to like not using models/orms and instead just using a table helper/query builder. I could probably reflect a little more on this.

I've just been thinking about what @brianc 's been telling me the past couple days about keeping everything stupid simple, as I've been working, but maybe I'm getting carried away with this one and I'm totally missing the awesomeness of models. I just think it's unnecessary that currently the usage is making a query using models, wrapping the returning json results as model objects, and then in the views toJSON is being called on the models to get the data back out of it.

Let me know what you think.

ProCynic commented 11 years ago

I would be opposed to changing the current models setup to a table helpers setup. I haven't really thought about it enough to be explicit about all my reasons, but here's what I've got so far.

My understanding of the proposal is that the substance, irrespective of renaming, would be pretty much identical to the models we have now, except that find would return an array of attribute hashes instead of model objects. The functions find, findOne, insert, etc. would take a table name and schema and then manipulate the database directly, returning the rows object from the db driver, or equivalent. The problem is that the models we already have offer us exactly this simplicity, with the negligible addition of calling toJSON in the controller. This may constitute a technically unnecessary method call, but I don't see how tables.Item.findOne(3, function(err, obj) { res.send(obj); }) is significantly more simple than models.Item.findOne(3, function(err, obj) { res.send(obj.toJSON()); }).

All that is to say that the table helpers approach doesn't really buy us much. But the Model approach also offers some advantages of its own. The principle one is that it gives us a centralized place to put any logic associated with a particular table. Validating an object against a schema could just as easily be a table helper function, but what about custom validation rules that aren't embodied in the schema? Mixed in to the controller? The same thing with fetching sub-resources, such as getting all the items of a restaurant. And probably most importantly in the long run, what about derived properties? Or maintaining internal consistency of an object? The database is not going to be so perfectly normalized that we won't have to worry about that.

It's true that all of those things could be done under the table helpers scheme, but they'd be scattered around the app in a mixture of redundant bits of controllers and global utility functions which would inevitably grow into a spaghettified mess over time, as opposed to being concentrated into a single object which maintains responsibility for all aspects of itself. And the longer we left it the worse a problem it would become. In fact the worst systemic problem magic has now is the way the lack of models adds unnecessary complexity to the controllers.

Doing the simplest possible thing which will get the job done now is a wonderful principle which protects against getting caught in a swamp of unnecessary, premature generalization. But the art of good software design is about finding ways to attain that simplicity without closing off the ability to maintain simplicity as you grow in the future. And I think models, and the OOP principles on which they're based, do a better job of that than the alternatives. There's a reason MVC became so popular. However, the most significant reason I favor models isn't anything I can name explicitly so much as the feeling that it keeps our options open for the future without really costing us anything in the present. Especially the extremely light weight version we have. I think it would be a mistake to abandon them.

lalitkapoor commented 11 years ago

Hey Geoff,

These are good points. I agree mostly with you, maybe I should've been a little more clear about what exactly I don't like about models, but there is definitely a lot of overlap, including some of the things that you mentioned such as having a centralized place to keep logic associated with a table, a way to do custom validation, embody a schema, etc. I'm all for those :+1: -- What I've found is that I generally haven't utilize much of the other things like fetching sub-resources, calculating derived properties, etc -- and these all cause us to wrap the results from the database as instances of the model and with this some inconveniences are introduced. I also know some of my concerns are biased from the early days when we were using mongoose, they weren't pretty days.

Having said all that, I think this discussion is more work than intended, and maybe I should've realized that sooner! It's better we just roll with something. I think we agree entirely on the fundamental concerns, and we mostly agree on a solution to the problem, and since you've already got a solution started out, lets roll with what you've already got :smile_cat:. If I start to feel like we may be running into some issues as things get more flushed out, I'll point em out, and then we can revisit this and explore creative solutions to the problems later. I'm confident we can work though them then anyway. So, if you're cool with it, lets cut this discussion short. I just really want to work on get this baby pumped so we can make some :moneybag: Btw, totally didn't expect you to write all that, it's was very nice of you, and i love you for it. Thank you for being patient with me! :heart:

jrf0110 commented 11 years ago

I started writing a lot about this, but then I realized I'd rather just talk about it in person.

I'm totally cool with Data Model Classes. We can probably get some neato behavior by making some assumptions about data access. However, in order for us to actually be able to make those assumptions, we have to be consistent about our use with them. That is, all data should have it. I don't want to be in a situation where it's ambiguous whether a piece of data is just a standard object or if it's an instance. That will suck.

As long as we don't obfuscate the way to actually get the value or meaning from an instance, we should be good. If I can now know that at any moment, the data in my possession can be accessed via a .toJSON() call, I'm cool with that. But is it ever ambiguous as to whether the object has all of its properties?

Like.. if in some part of the code, we pass a Product model around. And we're looking at this model and asking "do I need to call model.fetch() and get the properties first or does it have all of the latest info?" I don't know. When you're just dealing with values, it's less ambiguous. Like, if you were given a productId, you would know that you definitely need to go get the rest of the properties. But if you were given a Product model instance with the id on it, then you don't know immediately. You have to dig in and check for yourself. And maybe it does have the properties you're interested in. You don't know how up-to-date those properties are.

I'd be interested to try it out though :D

brianc commented 11 years ago

The tl;dr is there's no perfectly right or wrong answer.

I'd like to point out my biases quickly: I used to be an architecture astronaut in a way you can't truly understand unless & until you've used an enterprise language like Java or C# with multiple xml files, tons of reflective based inversion of control containers, and 5 layers of methods to do everything. Example: "why did you wrap the webService call behind IAccountService interface & then make an implementation of that interface as a decorator pattern with methods that do nothing but call the internal instance of the webSerivce?" My answer: "Because what if we want to change the webSerivce later?" Did we change it? No. Did I waste a day on that shit? Yes, absolutely. Was it more of a pain in the ass every time we added a method to our web service than it should have been because now I gotta go change 3 other files? Yes. I've fought long and hard with myself and my old boss to come back down to earth. So, my bias now lies very much towards "ship product, clean later" up to a point. I think especially in the case where you're still not profitable every key-press not directly related to making money is t echnically waste. Obviously I'm not that nazi about it when it really comes down to it, but I try to keep reminding myself of that fact so I can keep myself tethered to the ground.

I've been down this path multiple times before.

Doing the simplest possible thing which will get the job done now is a wonderful principle which protects against getting caught in a swamp of unnecessary, premature generalization. But the art of good software design is about finding ways to attain that simplicity without closing off the ability to maintain simplicity as you grow in the future. And I think models, and the OOP principles on which they're based, do a better job of that than the alternatives.

I agree with most of what @ProCynic said. Just so we're clear, how I'm summarizing his comments to myself is basically "model classes give a single place to put data manipulation & access behavior instead of smearing it across a whole whole bunch of controllers and random util methods." That makes total sense. Another common pattern is the repository/services pattern where you have methods in the repository for plain the data access as well as services dealing with model interactions (e.g. userService.authenticate(userRecord)). Both work. I've found it also works well in more cases than you'd expect to just smear logic across controllers and only abstract as you need it. With test coverage the "smear & clean later" approach isn't too bad. WIthout tests though you can get really, really boned doing that -- especially in a dynamic language & multi-developer projec t.

Anyways...there are three things I would warn about. These aren't addressed to @ProCynic, just everyone, mostly myself, whenever I get into situations like this.

1) Are you abstracting too early? Are you sure you're not abstracting too early? Have you written that same method or most of it another place already? No? Are you absolutely sure you're not abstracting too early?!

2) Be very careful when dealing with models and node and models fetching themselves from the database you don't start writing an ORM. I know this from first hand experience. Remember when I just started @ goodybag I was talking about "I wanna write an ORM for node!" Don't do that. It's harder than you think. Then you figure out it's harder than you think and then it gets even harder than that. Also, fuck ORMs...they blow and are a huge impedance mismatch between your relational data and your "model" objects. I think it's questionable if doing "update in place" 1 row per 1 model type of data modeling is even the right way to go, but that's another tangent.

3) I'd like to echo what @jrf0110 pointed out and bang that drum again. It's very important everyone knows the "proper" way to access the database (be it models, repositories, or code smear) and everyone stick to the same path. Otherwise you end up w/ 1/2 models and 1/2 raw json stuff and you're in a worse position than any 1 way.

4) "Models with behavior" breaks down a bit in a state-free request/response cycle like web development. It's cool to write user.save() over just the query, but it matters a lot less if you encapsulate behavior into models when they are born, used, and die within a matter of milliseconds. If you do go the model route I'd try to keep as much "behavior" logic out of models as possible other than validation & synthetic properties. I'm not exaggerating when I say I worked on a 4000 line active_record model because "well behavior should live in the model" was the followed mythos.

A final thought...I know it's "proper" to use models isn't a bad idea, but I still think it'll end up slowing things down a bit. At the end of the day more code & more abstraction is just that, and it takes time to write & reason about and also adds one more thing to keep in your head while you're working. It always comes down to whether that abstraction saves more time than its overhead introduces.

http://www.youtube.com/watch?v=nRSYU4YSISA

brianc commented 11 years ago

I would like to add one thing. I have seen the code @ProCynic has written so far & in my humble and questionably worthwhile opinion it looks good. It does the job, it doesn't have laser beams and crazy refactorbation marks all over it, and it nicely encapsulates database interactions and reduces repeating things. :+1:

jrf0110 commented 11 years ago

Yeah - One of the things I've been thinking about is interface ambiguity when using models. When I've got a model that has sub-models, say Product.Categories or Product.Tags, is the categories field on the model attributes an array of JSON or is it an array of models? Or when I'm creating an activity stream record:

var item = new Activity({
  userId: user.get('id')
, data: {
    product: product // <---------
  }
})

Is the product supposed to be a model instance? Or is it just the hash?

Another concern is data interoperability. If we're just writing a script that interacts with our data, do we need to pull in all of our data classes to do that? What if we don't for simplicity sake and then later we want to bring in some functionality from the API that does use data classes?

I'm in the camp of data should be a value. And the simplicity camp. But I'm willing to give instance methods a shot vs static methods. That's the real argument, right? We're wrapping our data with a class so methods can be a part of the instance. So we get:

model.save();
// Vs
db.models.save( model );
lalitkapoor commented 11 years ago

@jrf0110

  1. I believe you'd pass in the value not the instance
  2. thinking...
  3. yep. The argument, as I see it is (and this is where @ProCynic and I differ in terms of preference):
    • whether we get back model instances or values when making queries
    • whether we want instance methods and derived properties or only static methods

@ProCynic correct me if I'm wrong on the above.

lalitkapoor commented 11 years ago

@jrf0110 mis-read your first question, updated my answers above.

lalitkapoor commented 11 years ago

Don't know if this should be it's own issue, but keeping it here because it relates to a bunch of stuff discussed.

@proCynic So I was thinking about the stuff I got done yesterday -- I'd say about half of my time was spent discussing and refactoring working code to do the following:

  1. utilize models (e.g. Generating restaurant menu page)
  2. be more correct (e.g. when logging in do a post to /session instead of /auth/login).

I'm am mostly fine with this, mostly because I want a consistent and agreed upon way of doing things. For point 2, I'd say this -- if the desired output is achieve (in this case the user can login) and it's not hurting anything else, for the sake of shipping asap, lets just create tickets for these kinds of issues and we'll get back to em when we have time. Totally down to fix everything to be more correct :+1:

Ok, so regarding point 1, I do have some pressing concerns about being able to discern what's going on quickly enough, complexities, and maintainability.

Ok, so I'm going to work through an example from yesterday (sorry for all the pasting of code):

As we had discussed in #4, the following code was refactored:

module.exports.menu = function(req, res) {
  var data = {
    restaurant: null
  , categories: null
  , items: null
  };

  var tasks = {
    getRestaurant: function(callback) {
      var query = {
        type: 'select'
      , table: 'restaurants'
      , columns: ['*']
      , where: {id: parseInt(req.params.id)}
      }

      var sql = db.builder.sql(query);
      db.query(sql.query, sql.values, function(error, results) {
        if (error) return res.error(errors.internal.DB_FAILURE, error, callback);
        data.restaurant = results[0];
        return callback();
      });
    }
  , getCategories: function(callback) {
      var query = {
        type: 'select'
      , table: 'categories'
      , columns: ['*']
      , where: {restaurant_id: parseInt(req.params.id)}
      , order: {order: 'asc'}
      }
      var sql = db.builder.sql(query);
      db.query(sql.query, sql.values, function(error, results) {
        if (error) return res.error(errors.internal.DB_FAILURE, error, callback);
        data.categories = results;
        return callback();
      });
    }
  , getItems: function(callback) {
      var query = {
        type: 'select'
      , table: 'items'
      , columns: ['*']
      , where: {restaurant_id: parseInt(req.params.id)}
      , order: {order: 'asc'}
      }
      var sql = db.builder.sql(query);
      db.query(sql.query, sql.values, function(error, results) {
        if (error) return res.error(errors.internal.DB_FAILURE, error, callback);
        data.items = results;
        return callback();
      });
    }
  }

  var done = function(error, results) {
    if(error) return;
    var menu = [];
    var categoryIdMenuIndex = {};

    for(var i=0; i<data.categories.length; i++) {
      var category = data.categories[i];
      category.items = [];
      menu.push(category);
      categoryIdMenuIndex[category.id] = i;
    }

    for(var i=0; i<data.items.length; i ++) {
      var item = data.items[i];
      var category = menu[categoryIdMenuIndex[item.category_id]];
      category.items.push(item);
    }
    res.render('menu', {restaurant: data.restaurant, menu: menu}, function(error, html) {
      if (error) return res.error(errors.internal.UNKNOWN, error);
      res.render('index', {content: html}, function(error, html) {
        if (error) return res.error(errors.internal.UNKNOWN, error);
        return res.send(html);
      })
    });
  }

  utils.async.parallel(tasks, done);

into this:

module.exports.get = function(req, res) {
  models.Restaurant.findOne(parseInt(req.params.rid), function(error, restaurant) {
    restaurant.getItems(function(error, items) {
      res.render('menu', {restaurant: restaurant.toJSON()}, function(error, html) {
        res.render('index', {content: html}, function(error, html) {
          if (error) return res.error(errors.internal.UNKNOWN, error);
          res.send(html);
        });
      });
    });
  });
}

and the model code to support this is:

restaurants.js

var Model = require('./model');
var utils = require('../utils');

module.exports = Model.extend({
  getCategories: function(callback) {
    var self = this;
    callback = callback || function() {};
    require('./category').find(
      {where: {'restaurant_id': this.attributes.id},
       order: {order: 'asc'}},
      function(err, results) {
        if (err) return callback(err);
        self.categories = results;
        callback(null, results);
      });
  },

  getItems: function(callback) {
    var self = this;
    callback = callback || function() {};
    var items = function(err) {
      if (err) return callback(err);
      if (!self.categories || self.categories.length === 0)
        return callback(null, null);
      var categories = utils.map(self.categories, function(cat) { return cat.toJSON().id; });
      require('./item').find(
        {where: {'category_id': {$in: categories}},
         order: {order: 'asc'}},
        function(err, results) {
          if (err) return callback(err);
          self.items = results;

          var catIndex = utils.object(utils.map(self.categories, function(cat) {
            return cat.attributes.id;
          }), self.categories);

          utils.each(results, function(item) {
            var cat = catIndex[item.attributes.category_id];
            cat.items ? cat.items.push(item) : cat.items = [item];
          });

          callback(null, results);
        }
      );
    }

    self.categories ? items() : self.getCategories(items);
  },

  toJSON: function() {
    var obj = Model.prototype.toJSON.apply(this, arguments);
    if (this.categories) obj.categories = utils.invoke(this.categories, 'toJSON');
    return obj;
  }
}, {table: 'restaurants'});

categories.js

var Model = require('./model');
var utils = require('../utils');

module.exports = Model.extend({
  getItems: function(callback) {
    var self = this;
    callback = callback || function() {};
    require('./item').find(
      {where: {'category_id': self.attributes.id}},
      function(err, items) {
        if (err) return callback(err);
        self.items = items;
        callback(null, items);
      }
    );
  },

  toJSON: function() {
    var obj = Model.prototype.toJSON.apply(this, arguments);
    if (this.items) obj.items = utils.invoke(this.items, 'toJSON');
    return obj;
  }

}, {table: 'categories'});

and this is the view for it:

<div class="row">
  <div class="col-lg-12">
    <h1>{{restaurant.name}}</h1>
  </div>
</div>
<div class="row">
  <div class="col-lg-8">
    <h2>Menu</h2>
    {{#each restaurant.categories}}
      <h3>{{name}}</h3>
      <div class="list-group">
      {{#each items}}
        <a class="list-group-item">
          <div>
            <div><h4 class="list-group-item-heading">{{name}}</h4> <i class="list-group-item-text pull-right">feeds {{feeds_min}}-{{feeds_max}} &nbsp;   &nbsp; price {{dollars price}}</i></div>
            <div class="list-group-item-text">{{description}}</div>
          </div>
        </a>
      {{/each}}
      </div>
    {{/each}}
  </div>
  <div class="col-lg-4">
    <h2>Orders</h2>
  </div>
</div>

These are my concerns:

I am concered about these things because my top priorities include:

Maybe I'm wrong or am just being paranoid. What are your thoughts?

brianc commented 11 years ago

The fact that this is causing this much discussion is already a code smell of sorts. Not that discussion is wrong, but this much discussion means there's something fishy going on me thinks.

Every point @lalitkapoor brought up is valid & the more I thought about it last night the more my thoughts aligned with his before I read his comments. As I've said before goal number 1, 2, 3, 4, and 5 is shipping. It's hard to swallow the "mess of code" sometimes and get shit out the door but in the end it doesn't matter at all how clean your code is for your MVP. If you become profitable you can re-write the whole damn thing on a waterproof laptop while sipping champagne in a hot tub.

I think using models is going to go 1 of 2 ways.

  1. You're going to shoehorn more and more logic into the model until you have a 1/2 functional, 1/2 legacy ORM that is more technical debt than it's helpful.
  2. You're going to use them where it's convenient and write raw sql where it's not, resulting in having model instances sometimes and raw json other times. I've honestly never been on a project using an orm where they didn't drop down into raw sql at some point.

Models introduce a "where is this happening" type of environment and they're often abstract at the expense of explicitness. Explicitness, though it is often more verbose, is more valuable than clever or magical code because you may write a line of code 1 or 2 times but you and others read it 50+ times.

They also introduce rigidity in that you now have to access and conceptualize your data based on how it maps into models. Models are not a fantastic fit for a relational data store, and at the end of the day the business runs on the data and derives value from the data, not from the models.

I think if you were to abstract the original code Lalit wrote this is how I would do it:

Here's the original:

https://gist.github.com/brianc/6165368/a6beae3a3daa59a5313cc2b03c753d9ad3efdbd1

First, move queries into an object to make them reusable without making them overly abstract. This also has the advantage of giving them a name so it's clearer what their purpose is:

Second, rename the paramter req.params.id to restaurantId for more clarity.

https://gist.github.com/brianc/6165368/da2ba8b03f3b8e050da8b0293d481ca64770697c

I think at this point the code is at least as readable as before if not more so without being abstracted. You can and should move the queries object to its own file and yes the first time someone has to mess w/ that method or query they will have to go checkin the queries.js file to see what's going on but after that I would figure they'd know by the name of the function what the query is doing.

jrf0110 commented 11 years ago

Furthermore, the more you separate out common query mixins, the less noise your queries will have:

https://gist.github.com/jrf0110/5699029

That's just some neato middleware doing the same sort of thing. You can easily create a pagination mixin, standard, sorting, embed foreign table mixin, result count total mixin. And then also create more specific mixins like brian was saying, except I would leverage SQL more and get everything in a single query. Could do something like:

var $query = {
  type: 'select'
, table: 'restaurants'
};

// Make query a find one
queries.general.findOne( id, $query );

// Embed items into restaurants
queries.general.oneToMany( 'items', $query );

// Sort the embedded items
queries.general.sort( { name: 'desc' }, $query.joins.items.expression );

// Embed categories into items
queries.general.oneToMany( 'categories', $query.joins.items.expression );

// Sort embedded categories
queries.general.sort( { name: 'desc' }, $query.joins.items.expression.joins.categories.expression );

// BAM everything in one query
db.query( $query, function(error, results){ /* ... */ });
brianc commented 11 years ago

I'm definitely fond of separating query helpers to build the base query stuff for you so it isn't all a big json object. Super badass and one of the major superpowers of MoSQL.

I'm still wary of the middleware approach though...seems like spreading your logic out over completely different functions. Though I can see some places where it would be useful.

jrf0110 commented 11 years ago

Well, the hope with the middleware would be that 90% of your query is cookie-cutter and your routes are just a configuration of various cookie-cutter parts. The other 10% could be achieved with custom functions acting as pseudo route handlers but really are just a layer manipulating the query object. Then, everything gets passed along to the generic handler.

jrf0110 commented 10 years ago

Well, this isn't happening.