refactor!: perf, restructuring, rm _getOrFind, rm params.$returning, use instance methods

fratzinger commented 8 months ago

Included in this PR:

BREAKING:

refactor!: renamed service.applyScope() to service.ModelWithScope()
refactor!: removed service._getOrFind() method. Use service._get() or service._find() directly.
refactor!: removed params.$returning = false. Use params.sequelize.returning = false instead (for bulk operations).
refactor!: single patch and remove functions now use single instance.update instead of static Model.update. So the single sequelize hooks get called.

Other changes

perf: paramsToAdapter: if id is not null, add id to the query intelligently.
- if query[this.id] === id we can skip adding the id again
- if !(this.id in query) we can just set query[this.id] = id safely
- else proceed as before. query[this.id] is guaranteed to be something. check for $and and add to $and
perf: patch and remove: Don't use [this.id]: { [this.Op.in]: ids } if ids.length === 1. Use [this.id]: ids[0] instead
perf: patch and remove: If the first _find call does not return any items (ids.length === 0), return instantly
refactor: Object.assign to object spread
refactor: deprecate service.Op. Use direct import from sequelize instead: import { Op } from 'sequelize';

Issues:

closes #384
closes #382

This is a coproduction of @DaddyWarbucks & @fratzinger. Thanks @DaddyWarbucks. This was fun!

DaddyWarbucks commented 8 months ago

Thanks for this! I will have a look at this today or tomorrow. I have been thinking about taking a look at some other perf things too. I re-wrote most of feathers-mongo recently: https://github.com/feathersjs/feathers/pull/3366 And I was thinking about taking a look here to see if there were any potential perf gains, especially around counting. I have some new features in Straightline that require lots of counting.

DaddyWarbucks commented 8 months ago

@fratzinger I made a few other tweaks to this branch that should give us some more performace gains.

Better usage of the sequelize attributes property (aka sql SELECT). This eliminates the need for the adapter commons select() function in most case.
Faster count. When using $limit: 0, the adapter now uses count instead of findAndCountAll.
Faster get. The paramsToAdapter now returns a limit: 1 when an id is present.

I also refactored the patch and remove methods. They had grown confusing with all the config merging. So they both now use some simple logic branches instead of trying to handle _getOrFind().

Let me know what you think.

fratzinger commented 8 months ago

@DaddyWarbucks Perfect! Looks great. Glad to have all the select() removed. Thanks for the quick response and the further improvements!

I added a commit with a few changes (sorted by relevance):

params.$returning for id !== null: I don't like this! As valid as it is for id === null, it's implemented poorly for id !== null. There's no feathers standard to express this. Returning [] for a single element does not make any sense. The ReturnType is wrong - nearly every of my afterHooks would throw errors. Nor does returning undefined make any sense, since it is not a standard of @feathersjs/adapter-commons. This problem existed before this PR though. I don't use params.$returning at all, so I'm maybe the wrong person to argue with but I would like to remove this for id !== null. The CI is expected to fail because there are tests that ensure this behavior. Please let me know what you think about that and how we should proceed. I'm happy to make any changes based on your opionion (even if it is restoring the currently expected behavior 😊 )
_getOrFind is not used in the code base anymore, we can think about removing it sometimes. It's not private though, so it's maybe used in some codebases. What do you think about marking it as deprecated with jsdoc and remove it in a future major release?
patch with id === null: Why do you add $limit: ids.length for result = ...? Is this necessary? I think it can be removed.
Disclaimer: This is not a problem of this PR and maybe should be handled in a separate PR but it can be considered as a bug: getOrder in paramsToAdapter for id !== null: In the FindOptions for id === null there is order. Why is not present in FindOptions for id !== null? I think we can easily add it.
Thought I had but dropped: Maybe we should replace Model.update and Model.destroy with instance.update and instance.destroy for id !== null. I think for _patch that's an easy change. But for _remove it's not so easy because we need to handle paranoid deletedAt. Also this can be considered a breaking change because the called sequelize hooks would be different. I'm unsure about the performance gains anyways.

fratzinger commented 8 months ago

I reviewed the open issues. As of now this PR will close #384, #382. I added them in the initial message so they get closed automatically when merged.

DaddyWarbucks commented 8 months ago

I have a couple little more things I wan to try tonight. This is exciting! I use this for my largest project and its exciting to know we are about to get some perf boosts.

DaddyWarbucks commented 8 months ago

To Answer some of your questions

1 - Agreed on all accounts. It is weird and could/should be marked as deprecated and removed later. The user can handle it in their own hooks if they don't want to return it to the client.

2 - Also agreed. With my refactor of patch/remove, I also noticed it was now unused but did not remove it for the aforementioned reasons.

3 - The limit: 0 could technically be removed for both get and patch. I did a little research last night and using limit on a primary key does not gain or lose any performance (SQL knows how to handle it), but for non primary/unique/indexed keys it can add some performance. It is understood that whatever the user provides as the service id should be primary/unique/indexed...but maybe it's not? Setting it in paramsToAdapter handles the get, although it is not needed 99% of the time I still like that it intently states the purpose that an id has one result. For patch...same story with get, it may provide some benefits for non primary/unique/indexed keys. But your app has other problems if thats the case, like get will break, etc. It can definitely be removed. The limit in paramsToAdapter could be removed too, but I don't dislike it in that case.

4 - Totally. That is something I was going to add tonight too. I did similar in the latest PR for the mongo adapter. And there are a few other things I wanto to try that could really speed up single updates.

5 - I am not sure what to do there.

fratzinger commented 8 months ago

This is fun! I highly enjoy working on this. To your points again:

$returning: false for id !== null: I don't know how to deprecate that. Use console.warn when it's used? I would like to remove it now and let this PR lead to a major release.
✅ Added in my latest commit
Sorry, I think we talked past each other here. I don't think you got my point and I don't understand your point either. But maybe it's just on my side and you totally understood what I meant but I'm too dumb to understand your point. Nonetheless, I marked a comment above ⬆️ (scroll up). That the line I was referring to. That's the whole point of no 3.
✅ Added in my latest commit
I think we can leave this for future us.

Happy to see the results of your experiments. Resolving no 1 is still on my list. I'll add that after your changes so we don't have any conflicts.

I'll also want to update all dependencies and migrate from mocha to vitest. If you have any thoughts on that, please let me know. Otherwise I'll do that change after this PR.

DaddyWarbucks commented 8 months ago

I am working on a number of things now! I will let ya know when Ia am done.

DaddyWarbucks commented 8 months ago

I pushed quite a bit of unfinished code last night just to get it off my computer. This is more finalized now. I will write out a longer comment addresses your notes and some other things I changed.

DaddyWarbucks commented 8 months ago

I will try to summarize the changes made and the reasoning behind each.

Sequelize

returning - @fratzinger has pointed out some things I missed about returning. I have misused this in some cases. Its main purpose is for Model.update where it indicates that the method should return results when the underlying DB supports it rather than use the RETURNING statement. This would offer a large perf gain on multi patches because we wouldn't need to lookup the results after the update. Perhaps we should move back to params.$returning. I don't love the $ in the param name.
You will notice that for single patch and remove we now use instance methods instead of the bulk method. This is to more consistently align with sequlize's hooks with that is actually happening. A single patch should call the model's beforeUpdate afterUpdate instead of beforeBulkUpdate and afterBulkUpdate.
I also eliminated the ignoreSetters param. This was only used on create, when it should have been on update and patch too. I attempted to further use raw to address this, but kinda stepped on my own feet. More on that at the bottom.

FIND

This was a rather simple update. If $limit === 0, then just use the Model.count method. We should see a marginal perf benefit here. When using Model.findAndCountAll in this case, sequelize would still call findAll({ limit:0 }) under the hood (and doesn't noop). I am unclear if it actually sends a SELECT * LIMIT = 0 or not. But now we just do the count.

GET

Really no change here. The paramsToAdapter does now add limit: 1. With some quick research I found that this will make little to no difference for primary/unique/indexed keys. But I still like that it states some intent.

CREATE

Not many changes here either. We are more intelligently using the select function. And the code has moved around a bit, but it is otherwise unchanged.

UPDATE

Some good perf gains here.

Do a Model.count instead of _get to validate the query.
Because we did the count, we have to build our own instance. No worries.
If sequelize.include === false (which I think is most use cases), then the instance does not have its associations and we can select/return it without an additional lookup. Else, do the _get to get results with includes.

PATCH

Single patches use the instance.update method to better use sequelize hooks. Furthermore we can skip the additional lookup if sequelize.include == false (which I think is most use cases) then the instance does not have its associations and we can select/return it without an additional lookup. Else, do the _get to get results with includes.

So this method is where I wanted to really use returning. It makes lots of sense here, but as @fratzinger suggested it may have different results on some of the other methods 😦 . For this method returning === true could save us the an expensive additional lookup if the method actually returns its results.

REMOVE

Single removes use the instance.destroy method to better use sequelize hooks.
There were a couple of other little things around $select here too but I don't remember the exact change.

So I think there are some definite performance and consistent benefits here. But we likely need to rework some things around raw and returning. I also think this warrants a major update. We can eliminate some unused methods, unused params, etc. I know I goofed on some of the returning stuff and it needs some work. Let me know what you think @fratzinger.

fratzinger commented 8 months ago

I was going back and forth on the sequelize.returning vs. params.$returning/params.returning thing and ended up on saying, that it should be sequelize.returning. While not perfectly happy with it, it is more consistent than having literally two options for the same thing.

I removed the sequelize.returning option for single calls as discussed before. I changed the tests accordingly.

I hate the magic that comes with instance.update. In 'single patch' I added a new test to see if we need to add $select: undefined to the first _get(id, { raw: false } call. Turns out the single patch is not working at all! I debugged the new test in sequelize source code (https://github.com/sequelize/sequelize/blob/main/packages/core/src/model.js#L4304). options.fields is not ['name'] but an empty array... It does not detect the correct change. We could manually set the option fields: Object.keys(values) but it feels like a weird hack that should be avoided in the first place.

@DaddyWarbucks wdyt? Please review this test https://github.com/feathersjs-ecosystem/feathers-sequelize/blob/perf/id-and-in/test/index.test.ts#L394 . Even if you remove the params containing $select the result is not updated. I would like to go back to Model.update instead of instance.update.

I can imagine there are more quirks in the new changes. We probably should add a few more tests or at least release this PR as a pre-release.

DaddyWarbucks commented 8 months ago

@fratzinger Can you elaborate on what magic comes with instance.update that wasn't there before (or that we don't want now)?

DaddyWarbucks commented 8 months ago

I made one final change to the single patch, using instance.set and instance.save instead of instance.update seems to work. Phew...that was annoying and weird. The whole "changed" thing is obviously inconsistent by what we just saw. I do like this approach because it saves us from having to re-lookup the result...but I am not married to it. I think we got TONS of other benefits on the changes we made.

DaddyWarbucks commented 8 months ago

I haven't forgotten about this @fratzinger. I am going to run this in Straightline next week and see how it goes. It uses the full gamut of features and should be a good test rig. I will also be able to get some quantifiable performance metrics out of it as well. The app uses feathers-profiler and actually has a dashboard that shows a bunch of metrics from the profiler. So I can switch between versions and see what kind of differences we are getting.

fratzinger commented 7 months ago

Nice! This looks good! Thanks for the insight.

I thought about releasing it as a pre-major, just to be safe without any trouble. I'll test it on our app then as well.

Do you want to release it or shall I do?

DaddyWarbucks commented 7 months ago

I have not yet updated the instance.set to not loop over keys. And I was just going to install it locally with npm link.

DaddyWarbucks commented 7 months ago

I removed the applyScope method that had been marked as deprecated because we are making this a major release.

I made a few updates to paramsToAdapter such that it does not use offset or order and sets limit: 1 when an ID is present. Otherwise the user could have done something like service.get(1, { query: { $skip: 1 } }) which probably shouldn't be allowed. And the DB should optimize to limit 1 result when using proper unique/indexed keys, but I still like the query and intent being more clear. I wrote a test for this too.

Local testing results

The change in the _find method to break up the Model.count and Model.findAndCountAll works well. I was previously unsure if Sequelize would actually issue the DB query if $limit: 0. But it does. So we are no longer wasting a DB query when using $limit: 0. Nice!

More testing to come later tonight or tomorrow!

DaddyWarbucks commented 7 months ago

I think this is ready to go. I installed it locally and stepped through a bunch of stuff and everything seems to be working as expected. Checking out the sequelize logs I could determine that counts, additional lookups, etc were doing what the changes intended.

I also updated the docs and added a few more tests.

@fratzinger Are you aware that when using instance.update sequelize may not actually update the record if it does not detect a change? That was a surprise to me.

For example

const result = await app.service('items').get(1);
const updated = await app.service('items').update(1, result);
const updated = await app.service('items').patch(1, { name: result.name });

Sequelize detects that noting changed via its set (or update?) method and therefore does not issue the UPDATE query. I am not sure if this was happening with the current master branch because it used some different methods. But in the current branch if you do an update or patch with the same data as the current...no UPDATE query is issued.

I noticed this while testing. I ran 100 updates and 100 patches where I was just mutating same values. Updates took substantially longer. For example

const result = await app.service('items').get(1);

for (let i; i < 100; i++) {
  await app.service('items').update(result.id, result)
}

for (let i; i < 100; i++) {
  await app.service('items').patch(result.id, result)
}

I didn't understand why updates where taking sooo much longer. It's because of how previous (to these latest commits) we would do a Model.build for update. So the instance had no way to determine what had changed and was therefore issuing the UPDATE query. But the patch method, where we would get the current instance and then instance.set(values) was able to noop because nothing changed. I then updated the update method to be more similar to patch where we get the current instance (instead of doing a Model.count and Model.build) and achieved the same results.

I was able to confirm this by then running the following queries and noticed that UPDATE was issued for each. Furthermore, the UPDATE query only updated the name property.

const result = await app.service('items').get(1);

for (let i; i < 100; i++) {
  await app.service('items').update(result.id, {
     ...result,
     name: `${result.name}-${i}`
  })
}

for (let i; i < 100; i++) {
  await app.service('items').patch(result.id,  {
     ...result,
     name: `${result.name}-${i}`
  })
}

This seems weird and dangerous to me...but its how sequelize is meant to be used I guess. It's very fast/powerful with how it either noops or only updates the actual changed properties, buts its magic for sure. What do you think?

DaddyWarbucks commented 7 months ago

@fratzinger Pinging you again to get your thoughts.

fratzinger commented 6 months ago

Released as v8.0.0-pre.0 🎉

feathersjs-ecosystem / feathers-sequelize