anywhichway / nano-memoize

Faster than fast, smaller than micro ... a nano speed and size (780 Brotili bytes) memoize for single and multiple argument functions.
MIT License
212 stars 12 forks source link

that setInterval #4

Closed titoBouzout closed 5 years ago

titoBouzout commented 5 years ago

Hello, thanks!,

Im very curious whats going on with the setInterval here https://github.com/anywhichway/nano-memoize/blob/master/src/nano-memoize.js#L60

Is called non-stop, really?

titoBouzout commented 5 years ago

The more functions you memo the more is it called......

anywhichway commented 5 years ago

Hmmm ... an uncaught design flaw I think. Has is actually cause you an issue with real code?

On Thu, Feb 14, 2019, 14:36 Tito notifications@github.com wrote:

Hello, thanks!,

Im very curious whats going on with the setInterval here https://github.com/anywhichway/nano-memoize/blob/master/src/nano-memoize.js#L60

Is called non-stop, really?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/anywhichway/nano-memoize/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuqSyew669YIXIWpoJ4d-l5MgyswPIBks5vNbqxgaJpZM4a8S6H .

titoBouzout commented 5 years ago

Im just trying to be constructive ok,

Is causing issues with my brain, and that's were the real code gets executed.

why this needs to be called every 0.001 seconds? like in 1000 times per second, seriously?

do we really need a setInterval(whatever, 1) ? can this be set in a less intrusive way?

From a technical point of view I can imagine:

Can we also set a "global cleanup" check? because as you may know, if we set 10 MEMO function, then we gonna do 10k per second! !!!!!!!

Im kinda sort of mad because I believe ( I dont even remember) came here because of a Medium article "ohhhh we the best at memoize look our benchmarks" and you benchmarking the most dumb things and not even paying attention at how the code runs.......... so yeah..... wth is this ?

anywhichway commented 5 years ago

Good points. Will adjust the code to compute when the next expiration needs to be. Probably done in the next 48 hours.

On Thu, Feb 14, 2019, 21:02 Tito notifications@github.com wrote:

Im just trying to be constructive ok,

Is causing issues with my brain, and that's were the real code gets executed.

why this needs to be called every 0.001 seconds? like in 1000 times per second, seriously?

do we really need a setInterval(whatever, 1) ? can this be set in a less intrusive way?

From a technical point of view I can imagine:

  • battery ?
  • browsers limiting setInterval because we do dumb stuff like this ? xD
  • I can't really grasp the idea of the 1sec/1000 times , Im sorry

Can we also set a "global cleanup" check? because as you may know, if we set 10 MEMO function, then we gonna do 10k per second! !!!!!!!

Im kinda sort of mad because I believe ( I dont even remember) came here because of a Medium article "ohhhh we the best at memoize look our benchmarks" and you benchmarking the most dumb things and not even paying attention the how the code runs.......... so yeah..... wth is this ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/anywhichway/nano-memoize/issues/4#issuecomment-463877908, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuqS0-7ECi7EgLLfskh4Z1jKlPpS9NYks5vNhUygaJpZM4a8S6H .

titoBouzout commented 5 years ago

Thank you, I would like to apologize for the way I wrote, Im sorry.

anywhichway commented 5 years ago

No worries. Note, I only wrote this code because someone else claimed they had written the fastest possible memoize and it was obviously inefficient, so it is ironic you found this flaw in my gnarly code (because I was also trying to make it tiny). The benchmarks are based on what others were using to make their claims of speed.

I retrospect simply setting a regular timeout would have worked. Granted, this will result in two function calls for every memorized function. But, if it is critical the memos expire in a timely manner there is not much choice. Expiring the memos during the regular call will slow down the regular call for large memo sets. Alternatively a timeout to be set when the memo is created if it is supposed to expire. What are your thoughts?

Get Outlook for Androidhttps://aka.ms/ghei36


From: Tito notifications@github.com Sent: Friday, February 15, 2019 12:42:27 PM To: anywhichway/nano-memoize Cc: Simon Y. Blackwell; Comment Subject: Re: [anywhichway/nano-memoize] that setInterval (#4)

Thank you, I would like to apologize for the way I wrote, Im sorry.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/anywhichway/nano-memoize/issues/4#issuecomment-464136720, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEuqS7EXEnV6tVrA0BLo-r4UazT10G23ks5vNvGDgaJpZM4a8S6H.

titoBouzout commented 5 years ago

I can think of :

  1. just setting a setTimeout(cleanup, maxAge) when the memoize function is setup and has a maxAge option set.

I understand this has the downside that some objects will be very short lived, as a value set 100ms ago will be deleted if a timeout to cleanup runs, even if the timeout is set as high as 10 seconds, ex: if the value is saved at the 9.900sec mark it will be just deleted once we reach 10sec

To fix this we can save Date.now() also with the value. Then the cleanup function compares current Date.now() with the Date.now() saved within the value.

I believe the total point of this library is to reuse values and avoid computing expensive stuff. The point of maxAge I believe is so we don't leak or hold data forever, I mean is not time critical.

So I think is worthy to use the Date.now() approach because objects will live longer and avoid recomputing stuff. Even if it makes it slightly slower when maxAge used it will still be faster because an expensive function will be reused.

  1. I was also thinking of a global interval, but this approach has many downsides. Stores for memoized functions should be global which will require special cleanup to not leak functions that are no longer used, Cleanup function will have to iterate over all the values of all the memoized functions set. Nevermind this.

Thanks!

anywhichway commented 5 years ago

@titoBouzout Fixed. Thanks for the assistance!

titoBouzout commented 5 years ago

ok lol. Just for the record of anyone reading, this is not fixed, he just moved the setInterval to the bottom and set the time of the interval in the variable expireInterval which still values 1. So nothing changed besides a bunch of variables renames. https://github.com/anywhichway/nano-memoize/commit/a5d42ae39ee38edb2111484f49efe8665883dc20#diff-168726dbe96b3ce427e7fedce31bb0bcR44

anywhichway commented 5 years ago

Tito, the setInterval now only gets set on the redefined function not every time it gets called. Unless I am reading my own code incorrectly ... which is entirely possible.

I am being pretty responsive here. Is there a reason you are being so negative?

Regardless, I will take another look. It is kind of hard to write unit tests for this part of the code. I will put in some trace statements.

I will reopen the issue when I get to a computer.

On Sat, Feb 16, 2019, 06:54 Tito notifications@github.com wrote:

ok lol. Just for the record of anyone reading, this is not fixed, he just moved the setInterval to the bottom and set the time of the interval in the variable expireInterval which still values 1. So nothing changed besides a bunch of variables renames. a5d42ae

diff-168726dbe96b3ce427e7fedce31bb0bcR44

https://github.com/anywhichway/nano-memoize/commit/a5d42ae39ee38edb2111484f49efe8665883dc20#diff-168726dbe96b3ce427e7fedce31bb0bcR44

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/anywhichway/nano-memoize/issues/4#issuecomment-464340595, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuqS9DHyvvrVnjnwJojgcsRxDNu2lrZks5vN_FngaJpZM4a8S6H .

anywhichway commented 5 years ago

@titoBouzout I have verified the fix is correct. If you wish to verify yourself, insert a trace statement below line 94 of src/nano-memoize.js and re-run the test suite. You will see the trace statement 4 times, but the functions are called far more frequently.

if(expireInterval) { console.log("Creating interval for " + fn); f.interval = setInterval(() => { // process key chngs out of cycle for speed for(const p in c) { if(maxAge) { tmout(c[p]); } delete c[p]; } },expireInterval); }

titoBouzout commented 5 years ago

If Im reading correctly, the interval was just moved but basically runs in the same place. I think you just skipping the setInterval for when the function does not have an expiration value, which is an optimization yes. But still the setInterval with 1 millisecond is there. If all my memoized functions have a maxAge value, then the situation didn't change.

Ill try to be more clear, you should never ever do setInterval(fn, 1). Find for other work around.

Lets do the math,

for(var a =0;a<100;a++){
for(var b =0;b<100;b++){
nanomemoize(function(){}, {maxAge:50000})
}
}

That situation is entirely possible if you have a lot of complicated tasks and you trying to speed up the complete thing, you do this keeping in mind that nanomemoize should just return the cached function or compute the new value. But nanomemoize does more than that..... it sets a lot of setIntervals that consume a massive amount of CPU for nothing productive.

So if you have 100x100=10,000 setIntervals of 1 millisecond, translating this to a second you have 10,000x1,000 = 10,000,000 functions calls per second, and the setTimeout has a loop with at least 2 calls more and sometimes 3, so this is 30millon. This leaves little margin for doing anything else and totally miss the point of whats trying to achieve.

You can end on a situation like this if your code base is big and have lots of loops. Or lets say you have 100 functions to optimize and then you have a daily cron job, so in 100 days you are in this situation Im describing. Specially because the setInterval is not cleared.

I have a suggestion, run my code in the browser and look how the CPU tops at 100%(is doing nothing, just running intervals). Then run the same code again but with 10x10 and see how the CPU does not move from around 30%(again doing nothing, just running intervals).

10x10=100 setIntervals of 1 millisecond, translating this to a second you have 100x1,000 = 100.000 functions calls per second or 300.000 in the worst case.

Im not trying to discourage you, you also been very tolerant, but this point of the interval is off and for some reason you not seeing it. I should have more patient, no excuses, Im trying to help.

I described a solution that will fix your problem and in the morning wrote a proof of concept. Basically:

var memo = (function() {
    function serialize(o) {
        return JSON.stringify(o, _serialize)
    }

    function _serialize(k, v) {
        if (typeof v == 'function') {
            return 'Function ' + v.name
        } else {
            return v
        }
    }

    return function(fn, expires) {
        const f = function(cache, serialize, expires, setInterval, fn, ...args) {
            const k =
                args.length == 1 &&
                (!args[0] ||
                    typeof args[0] === 'string' ||
                    typeof args[0] === 'number' ||
                    typeof args[0] === 'boolean')
                    ? args[0]
                    : serialize(args)

            if (cache[k]) {
                cache[k].n = Date.now()
            } else {
                cache[k] = { v: fn(...args), n: Date.now() }
            }
            expires && !this.timeout && setInterval()
            return cache[k].v
        }
        f.cache = {}

        f.expires = expires
        if (expires) {
            f.setInterval = function() {
                this.timeout = setInterval(
                    function(expires) {
                        const n = Date.now()
                        var all_expired = true
                        for (var k in this) {
                            if (n - this[k].n > expires) {
                                delete this[k]
                            } else {
                                all_expired = false
                            }
                        }
                        if (all_expired) {
                            f.clearInterval()
                        }
                    }.bind(this.cache, this.expires),
                    this.expires
                )
            }.bind(f)

            f.clearInterval = function() {
                clearInterval(this.timeout)
                this.timeout = false
            }
        }

        return f.bind(f, f.cache, serialize, expires, f.setInterval, fn)
    }
})()
anywhichway commented 5 years ago

@titoBouzout I am going to re-open this issue until we agree it is resolved.

I had never considered the case of memoizing hundreds of functions. Indeed using an interval with more than a few functions would kill the CPU.

Your most recent explanation of a solution is quite clear and after I read your first few lines about the many function memoization your approach had already popped into my head by the time I read it.

This will not be hard to do. Give me a couple of hours.

anywhichway commented 5 years ago

@titoBouzout I think the latest build v1.0.2 addresses all your concerns. And, it turns out setInterval is not even needed once one focuses on the calling of the memoized function not the creation of the memoized function. It can all be done with setTimeout. Also, the code is now 25% smaller and will use far less RAM for caching at runtime.

Please close this issue if you think it is resolved.

Thank you for bringing this important flaw in the code to my attention.

titoBouzout commented 5 years ago

OK thank you, sorry for not presenting the issue in the best way, have a nice day!

popbee commented 5 years ago

Hey! First thanks for doing all this work, appreciated!
Now I think I see an issue with the latest maxAge/setTimeout implementation. 🤔

If I read the code correctly setTimeout() is being called on ALL cache hits and cache misses.

The use case of using a quick memoizer is to call an expensive function many times (1000+). That means even if my cache contains just 5 elements, I will have 1000+ timeouts racing concurrently to delete those 5 keys. If I continue accessing the cache, it will have the net effect of recreating the entries all the time anyways since the delete() will get called as often as I accessed it (albeit delayed of maxAge ms). This is not very good IMHO.

So to fix this without changing too much stuff, what about setting up the setTimeout on cache misses only? I know, it won't refresh the TTL on cache hits, but still improves overall and keeps the "happy path" fast(er).

Here I dropped the chng param and created a "d" (delete) function that gets called on all cache misses right before calling the expensive function (using a comma operator). Even if the timeout is 1ms and the function takes 10 seconds to run, the timeout callback will not fire before exiting the JS stack. Now setTimeout appears only once in the code so potentially even saving a few bytes(?).

      d = (s, k) => setTimeout(() => delete s[k], maxAge);
    // for single argument functions, just use a JS object key look-up
    function sngl(f, s, serializer, arg) {
      // strings must be stringified because cache[1] should not equal or overwrite cache["1"] for value = 1 and value = "1"
      const key = !arg || typeof arg === 'number' || typeof arg === 'boolean' ? arg : serializer(arg);
      return s[key] || (d(s, key), s[key] = f.call(this, arg));
    }

I didn't test this, so to be taken with a grain of salt.

anywhichway commented 5 years ago

@popbee You are onto something here. Thanks for the feedback. I like your approach! Will try it out tonight.

anywhichway commented 5 years ago

@popbee I implemented your approach. Performance tests show the same speed, but they don't really evaluate the situation you describe. Your analysis of the code made perfect sense. I needed two different cache expiration functions because there are two types of possible cache. Also, it is arguable that the cache hits should not extend the life of a cache value because it could be the cache is actually tied to an underlying sensor that needs to be re-sampled.

popbee commented 5 years ago

Great stuff! Surprising that's the same speed. I would have expected faster on cache hits.

Anyhow, I am happy that you improved your library.

Note: I had only a single setTimeout function because it was parametrized to delete from the appropriate cache. For the "multiple" case, it was called with v instead of s. But again I didn't test this, so I could be plain wrong.

euroclydon37 commented 5 years ago

Unfortunately, I'm going to have to take a different approach because all timers traverse the js bridge in React Native. This isn't usable in its current form.

euroclydon37 commented 5 years ago

Turns out nearly all memoization libraries use timers in some way. We're not using these libraries for caching responses or anything. We literally just want the last result until args change.

anywhichway commented 5 years ago

Hmmm ... should not be too hard to implement with a config argument. Will take a look over the next couple of days.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Jeremy Dunn notifications@github.com Sent: Monday, March 11, 2019 2:12:37 PM To: anywhichway/nano-memoize Cc: Simon Y. Blackwell; State change Subject: Re: [anywhichway/nano-memoize] that setInterval (#4)

Turns out nearly all memoization libraries use timers in some way. We're not using these libraries for caching responses or anything. We literally just want the last result until args change.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/anywhichway/nano-memoize/issues/4#issuecomment-471680901, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEuqS0P4PiUDy2AHLjpoMSwuMpbqOSCMks5vVqqlgaJpZM4a8S6H.

anywhichway commented 5 years ago

@euroclydon37 Avoiding timers was trivial. A new version, 1.0.5, has been pushed. Just set the start-up option for maxAge to Infinity.

popbee commented 5 years ago

@euroclydon37 For a single entry memoize for react purposes, memoize-one is the way to go.

popbee commented 5 years ago

@anywhichway I do not understand why you added the Infinity option as you already had the feature with undefined and/or 0. You just made your library larger(?)

anywhichway commented 5 years ago

@popbee ... undefined and/or 0 would have worked, but if was also possible to set a timeout of Infinity which would have produced a whole bunch of non firing timeouts ... talk about a memory leak!

popbee commented 5 years ago

I suspect Infinity does not work with setTimeout(). We should try it. To me this project's goal was extreme speed and size but I am not sure this is still true.(?)

anywhichway commented 5 years ago

@popbee You can successfully use setTimeout with Infinity in V8 and it returns a timeout id. What it will do is questionable. We would have to ask the V8 team. When the raw file is compressed using the best compression tools, a combination of jscompress and txtwizard, without putting webpack or browserify in the middle the size is less than 600 bytes and is still more than 300 bytes less than other libraries that even come close is size, speed and functionality. A trivial memoizer that does not perform can be probably be written in less 250 bytes. Also, I found a few other characters to remove, so adding Infinity did not increase the size.

The goal is still extreme speed and extremely small.

If you can find other opportunities to reduce the size or increase the speed I would like to know what they are so I can incorporate them.

popbee commented 5 years ago

I just ran a few tests with setTimeout in my Google Chrome version 72.0.3626.121 and only undefined has an unknown behavior. We could try in other browsers.

anywhichway commented 5 years ago

@popbee Fascinating ... so what do you think the legit value for maxAge should be? I am a little reluctant to change it since is would technically require deprecation activity. Note, I just tested setTimeout(() => console.log("fired"),undefined) and it fires immediately. maxAge = 0 would seem to imply it can never be cached whereas Infinity would seem to imply cache forever. In retrospect, I think I should have chose that to start with.

popbee commented 5 years ago

On terms of size savings, I have lots of ideas (one you disregarded in this very thread with calling settimeout once).

Gzip / Brotli size is important but parsing time is a great deal too (which depends on uncompressed size). Especially true with react native.

But the trends these days is to build libraries that are tree shaking compatible so you do not pay for what you do not use. In your case, it would be quite a profound refactor for the small savings.

anywhichway commented 5 years ago

@popbee I made the change to call setTimeout once on arg changes only:

// set chng timeout only when new value computed, hits will not push out the tte, but it is arguable they should not
    return s[key] || (chng(key),s[key] = f.call(this, arg));

and also line 46

True on the parsing side, but not much to parse just as like there would not be be much of a tree to shake. I could imagine a design that would shake out to single or multiple args, but it seems there would be barely a program it would apply to. And I suppose if a program never called clear, it could be dispensed with. However, at this level of micro optimization, I would be more inclined just to fork the code and take out the parts I did not need.

I have not gone back and looked at the code in the context of the parameterizing comment you made though 23 days ago. I will try and get to that this weekend. It might get 8 to 10 bytes.

At this point the exercise for me has become more one of being able to continually revisit something and see the possibilities for improvement. I doubt anyone would see much of a production impact.

I appreciate the ongoing engagement.

anywhichway commented 5 years ago

@popbee, titoBouzout You really should get some credit for the contribution you have made to improving this library. Even a minor change to the README via a pull request would push you onto the contributors list (assuming you want to be there). Otherwise, with your permission, I will just manually reference you in the README.

popbee commented 5 years ago

Yes, I know you did that change. It's the other parts I was refering to. (I see the word setTimeout twice in the source). But it's OKay.


You can mention me as you wish. No need to be on the contrib list, but thanks for asking. :)

I was wondering if you update your size/speed tables when you make changes? Unless this is automated, I would imagine being a bit tedious to do so...

True that on a big project, your library is really a tiny drop. BUT on a tiny project (something that holds into a small Githubissues.

  • Githubissues is a development platform for aggregating issues.