feathersjs-ecosystem / feathers-hooks-common

Useful hooks for use with FeathersJS services.
https://hooks-common.feathersjs.com
MIT License
193 stars 89 forks source link

Should discard and populate first make a copy of the result objects? #150

Closed eddyystop closed 7 years ago

eddyystop commented 7 years ago

This is relavent when Sequelize and Mongoose return ORMs rather than POJS objects. @daffl and @marshall think this is a good idea.

The discard and remove hooks work the same way. We should probably change both. We would have to choose a deep clone utility. They are slow but would only have to process a limited number of records.

I have concerns about making deep copies of every recursive record in populate. I think this would degrade performance. We were concerned enough about populate's performance that we included perf measurement features in the hook.

Perhaps we should consider running .toJson or .toObject if its found on a result record. This way the performance penalty is paid only by Sequelize use without raw:true (now that Mongoose defaults to lean:true).

However I had a longish discussion with ryan.wheale. He has come around to the following point of view:

"Well, I am on board with the idea that the ORM "goodness" is probably only used a fraction of the time. Most services will just load data, do some simple data mods, and then send it out.

In the case that users need the ORM stuff, they will need to instantiate the data. This could ship as a pre-built hook for convenience and would be easy to document.

The documentation could clearly state that instantiating ORM objects will likely break feathers hooks so that nothing comes by surprise (as it currently does)"

This would mean that hook.result must contain POJO and if the dev wanted the ORM, they would run a hook and the ORM would be instantiated on, say, hook.params.instantiated.

So ... decisions. I'm attracted to ryan.wheale's idea.

marshallswain commented 7 years ago

I'm on board with everything you just said. I didn't think about performance penalties of deep cloning everything.

eddyystop commented 7 years ago

I would like ryan.wheale to contribute his thoughts.

DesignByOnyx commented 7 years ago

In order to play well with others, feathers-sequelize should always run raw queries. This is the first part of the discussion that needs a concrete decision as this will likely break existing apps (including the one I've been working on for the past year).

From there, I think that the ORM adapters can ship with a simple hydrate or instantiate hook that users can use to get ORM instances if they so choose (because we're nice like that ;). After having thought about this for some time I think it also makes sense to have the inverse serialize hook as tell to turn the data back into an POJO for further use. This would allow people to compose hook pipelines in an easy-to-conceptualize way:

[
  someCommonHook(),
  instantiatORMObject(),
  processORMObject(),
  serializeORMObject(),
  someOtherCommonHook()
]
eddyystop commented 7 years ago

@DesignByOnyx, for clarity, would both the POJO and the ORM be at hook.result? You previously were playing around with the idea of hook.data always being a POJO, and any instantiated ORM would always be hook.params.instantiated.

DesignByOnyx commented 7 years ago

I think it makes the most sense to instantiate on hook.result - the biggest reason is that data gets out of sync really quickly and you start to experience the whole "source of truth" issue. If the ORM objects are mutated, then someone (or something) has to be responsible for updating hook.result before the response goes out. This seems like unnecessary confusion and I think it's easier to document and debug the instantiate -> process -> serialize flow, always using hook.result to hold the data.

Another compelling reason is that by using hook.result - users don't have to re-serialize the data. The ORM objects automatically get serialized when the response is sent.

sasdf commented 7 years ago

For In-memory database (feathers-memory), applying discard on after hook will mutate the object in database.
Testing code:

'use strict'
/* Versions:
 *   "feathers": "^2.1.1",
 *   "feathers-hooks": "^1.8.1",
 *   "feathers-hooks-common": "^3.0.0",
 *   "feathers-memory": "^1.1.0",
 */

const feathers = require('feathers')
const app = feathers()
const hooks = require('feathers-hooks')
const { discard } = require('feathers-hooks-common')
const db = require('feathers-memory')

app.configure(hooks())
app.use('/test', db())

app.service('test').hooks({
  after: {
    create: [
      discard('bad')
    ]
  }
})

app.service('test').create({
  good: 1,
  bad: 1
})

app.service('test').get(0).then(
  console.log.bind(console.log)
  // Output: { good: 1, id: 0 }
)
eddyystop commented 7 years ago

Place this hook before the discard:

hook => {
  hook.result = JSON.parse(JSON.stringify(hook.result));
  return hook;
}

I suspect feathers-memory is returning the actual object in the database in hook.result. Mutating it, mutates the database.

daffl commented 7 years ago

Yes, feathers-memory does not make a copy of the result. I think those hooks should make shallow copies of the data (and deep copies of objects that are accessed via the dot . operator).

marshallswain commented 7 years ago

You don't think shallow copies would lead to confusion? To avoid confusion, seems like maybe feathers-memory should return a deep clone inside the service. Isn't it the only adapter that's inconsistent?

sasdf commented 7 years ago

Returning deep clone make sense, but maybe it will losing capability to store any js internal Object(something circular or connected with callback, etc)? Maybe left an option in constructor?

eddyystop commented 7 years ago

This problem is not limited to hooks.

Besides this present issue, @daffl helped me realize that if you modify data in service.on('created', data => ...) that modification will mutate the data in the service.create(...).then(data => ...). (I had to clone deeply as I absolutely had to use a mutated version of that event data .)

I suggest it needs to be handled elsewhere to solve it throughout Feathers.

I also suggest a deep copy would have to be made, else the problem will just be buried deeper in sub-objects.

eddyystop commented 7 years ago

I've thought some more about this.

There is no documentation about which Feathers param objects may not be safely mutated. So people will run into unexpected issues which may become time consuming.

Even if the common hooks make (deep) copies where needed, developers will still be exposed to the problem as they write their own hooks.

I think Feathers core should at least freeze any non-mutateable objects, preferably deeply. That way problems should pop up quicker and the reason would be fairly obvious.

This seems a fair balance between speed and safety.

eikaramba commented 7 years ago

Just updated to latest common-hooks and experienced this problem with sequelize. IMHO why not handling the different cases of ORM inside the hook? e.g. for sequelize checking if items.dataValues is there or setDataValues is a function. Sure, this means supporting every single ORM that feathers currently support, but that's how it is. Just providing shallow copies of the objects would break a lot of stuff and although i'm already having a unsequelize() hook myself for such use cases, i'm not sure if this is a good idea to have by default.

BTW: here's my own discard copy that handles sequelize objects as well. If anyone has the same problem currently

const { getItems } = require('feathers-hooks-common');
....
discard: function (...fieldNames) {
    return function (hook) {
      let items = getItems(hook);

      (Array.isArray(items) ? items : [items]).forEach(item => {
        fieldNames.forEach(fieldName => {
          let value = typeof item.setDataValue === "function"? item.dataValues[fieldName] :item[fieldName];
          if (value !== undefined) {
              if(typeof item.setDataValue === "function"){
                delete item.dataValues[fieldName];
              }else{
                delete item[fieldName];
              }
          }
        });
      });

      return hook;
    }
  },
eddyystop commented 7 years ago

The new populate and serialize hooks always required POJO. This is not new.

No common hook (except the legacy populate) handles any ORM object, so 'fixing' the problem in populate isn't fixing it. feathers-mongoose v4.1 already defaults to a plain JS object. We have a plan for Sequelize which DesignByOnyx describes above.

Since Feathers only handles POJO its your responsibility to provide it such, with { query: { raw: true } } in the base method call, and select: { raw: true } in the includes.

eikaramba commented 7 years ago

i started from the cli skeleton and as a new user i would assume everything works as described(the current code the cli is generating uses discard and does not to prevent this issue from happening). so then the docs need to be adapted. can this not be handled by the database adapter, if that is really required in the future?

marshallswain commented 7 years ago

Yeah. We can probably add a hook to the generator when you select Sequelize. That would be a good thing to learn right up front.

eddyystop commented 7 years ago

@marshallswain The solution we've settled on is https://github.com/feathersjs/feathers-hooks-common/issues/150#issuecomment-294531988. We will eventually have to remove any normalization hooks we add to the generator.

eddyystop commented 7 years ago

Now that https://github.com/feathersjs/feathers-hooks-common/issues/150#issuecomment-294531988 has been implemented I think we can close this issue.