MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

Cache pathway-gene-interactors for 3x speedup! #108

Closed linas closed 4 years ago

linas commented 4 years ago

Cache previous results, so that they are not recomputed again, if the results are already known. Note that this functin accounts for about 60% of the total execution time of gene-pathway-annotation so any caching at all is a win. In a test of 681 genes, this offers a 3x speedup in run time. (On my machine, execution time went from 61801 seconds to 19180 seconds)

Of course, on any one gene, or small handful of genes, there will be zero speedup; there has to be a cache-hit to see any benefit, i.e. there have to be multiple genes that share teh same pathways.

tanksha commented 4 years ago

@linas tested the above PR, I got an error Unbound variable: do-pathway-gene-interactors and the fix that I did is that the cache function has to be defined after the function to be cached.

(define-public do-pathway-gene-interactors 
  (lambda (pw)
   ....
))
(define-public pathway-gene-interactors
    (make-afunc-cache do-pathway-gene-interactors))

Can you do revert the functions so that I can merge?

linas commented 4 years ago

OK, Done.

Notice: you don't necessarily have to ask me to do this; you can also do this with a bit of git-branch, git-pull magic. Also, this fix is so tiny, you can also just cut-n-paste the code as needed.

Part the second: I found this one after measuring performance as described in #98 -- I don't remember off the top of my head, but I think that there are other functions that will benefit from caching as well. So this is more of an example. Maybe the most important one, but there are probably others. I guess I can help you identify the other cases, but I didn't want to flood your email inbox with a bazillion pull reqs.... let me know ....

tanksha commented 4 years ago

Notice: you don't necessarily have to ask me to do this; you can also do this with a bit of git-branch, git-pull magic. Also, this fix is so tiny, you can also just cut-n-paste the code as needed.

I was not actually sure about the fix, as I have never faced such error due to the functions definition order.

Yes, there are many functions which we can catch but I have read from the doc here that a a FUNC to be catched has to accept a single atom as an input, Is that true for all cases?

linas commented 4 years ago

such error due to the functions definition order

Me neither. I thought that was odd, but am not going to spend time on that.

there are many functions which we can cache

Maybe probably, but there are also cases where it won't make a difference (except to make the code more confusing and harder to understand) and there are other (rare) cases where it could make things slower... Measuring is important.

the doc here that a a FUNC to accept a single atom as an input,

Yes. One could also create more complicated variations, but those seem unlikely to be useful (because caching is not always a good idea).