MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

PATCH: Cache pubmed-id lookup. #110

Closed linas closed 4 years ago

linas commented 4 years ago

The simple caching below gives a 1.5x speedup for biogrid annotation. I'm just cut-n-pasting the patch here, you have to adapt it to your own tastes.

--- a/annotation/functions.scm
+++ b/annotation/functions.scm
@@ -1149,7 +1149,19 @@ rv)
        rv))

 (define-public (find-pubmed-id gene-a gene-b)
- (let ([pub (cog-outgoing-set (cog-execute!
+       (cache-find-pubmed-id (Set gene-a gene-b)))
+       
+(define cache-find-pubmed-id
+       (make-afunc-cache do-find-pubmed-id))
+
+(define (do-find-pubmed-id gene-set)
+"
+       This is expecting a (SetLink (Gene \"a\") (Gene \"b\")
+"
+ (let* (
+       [gene-a (cog-outgoing-atom gene-set 0)]
+       [gene-b (cog-outgoing-atom gene-set 1)]
+       [pub (run-query
      (GetLink
        (VariableNode "$pub")
        (EvaluationLink
@@ -1165,9 +1177,9 @@ rv)
            )
          )

-   )))])
+   ))])
    (if (null? pub)
-     (set! pub (cog-outgoing-set (cog-execute!
+     (set! pub (run-query
      (GetLink
        (VariableNode "$pub")
        (EvaluationLink
@@ -1182,10 +1194,11 @@ rv)
              (VariableNode "$pub")
            )
          )
-   )))
+   ))
    ))
    pub
 ))
+
 (define-public (find-crna gene protein)
   (cog-execute! (BindLink
   (VariableList
linas commented 4 years ago

With the above patch, I am getting 802 seconds execution time:

Time: 139.56440 secs. calls: 9441 avg:  14782.8 usec/call for find-go-term
Time: 215.86620 secs. calls: 681 avg: 316984.1 usec/call for match-gene-interactors
Time: 561.00654 secs. calls: 681 avg: 823798.2 usec/call for find-output-interactors
Time: 418.98079 secs. calls: 582626 avg:    719.1 usec/call for generate-result
Time: 31.595877 secs. calls: 582626 avg:     54.2 usec/call for build-interaction
Zero calls to find-protein-form
Time: 10.863595 secs. calls: 47096 avg:    230.7 usec/call for find-name
Time: 182.52820 secs. calls: 582626 avg:    313.3 usec/call for find-pubmed-id
Time: 132.88098 secs. calls: 9441 avg:  14074.9 usec/call for find-memberln
Time: 93.780047 secs. calls: 165392 avg:    567.0 usec/call for add-go-info
Zero calls to find-parent
Time: 28.674607 secs. calls: 9441 avg:   3037.2 usec/call for locate-node

Prior to this patch, it was 1205 seconds:

(biogrid-interaction-annotation gene-list  "my-biogrid-results"
      #:namespace "biological_process molecular_function cellular_component"
      #:interaction "Genes"
      #:parents 0)
Time: 140.83417 secs. calls: 9441 avg:  14917.3 usec/call for find-go-term
Time: 235.42098 secs. calls: 681 avg: 345698.9 usec/call for match-gene-interactors
Time: 943.35577 secs. calls: 681 avg: 1385250.8 usec/call for find-output-interactors
Time: 834.82299 secs. calls: 506787 avg:   1647.3 usec/call for generate-result
Time: 27.197969 secs. calls: 506787 avg:     53.7 usec/call for build-interaction
Zero calls to find-protein-form
Time: 11.202908 secs. calls: 47096 avg:    237.9 usec/call for find-name
Time: 600.92915 secs. calls: 506787 avg:   1185.8 usec/call for find-pubmed-id
Time: 133.93277 secs. calls: 9441 avg:  14186.3 usec/call for find-memberln
Time: 94.814262 secs. calls: 165392 avg:    573.3 usec/call for add-go-info
Zero calls to find-parent
Time: 27.621622 secs. calls: 9441 avg:   2925.7 usec/call for locate-node

(The above is after #2471 which gave a 10x speedup)

linas commented 4 years ago

Explicit pull req of this patch in #136

linas commented 4 years ago

closing, #136 was merged