MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

PATCH: Fixes for pathway-hierarchy/check-pathway #105

Closed linas closed 4 years ago

linas commented 4 years ago

Based on the results described in this comment:

https://github.com/MOZI-AI/annotation-scheme/issues/98#issuecomment-571444301

it appears that the two functions pathway-hierarchy and check-pathway are responsible for the explosion in the guile heap usage, as well as being giant CPU-time losers. To further ttrace this and fix this, I am using the following alternate implementation:

(define-public (run-query QUERY)
"
  Call (cog-execute! QUERY), return results, delete the SetLink.
  This avoids a memory leak of SetLinks
"
   ; Run the query
   (define set-link (cog-execute! QUERY))
   ; Get the query results
   (define results (cog-outgoing-set set-link))
   ; Delete the SetLink
   (cog-delete set-link)
   ; Return the results.
   results
)

(define-public (pathway-hierarchy pw lst)
   (format #t "Enter pathway-hierarchy sizeof lst: ~A\n" (length lst))
   (let* (
         [parents (run-query
            (Get (Variable "$parentpw")
               (Inheritance pw (Variable "$parentpw"))))]
         [junk (format #t "pathway-hierarchy found parents=~A\n" (length parents))]
         [res-parent (map
               (lambda (parent-pw) (check-pathway pw parent-pw lst))
               parents)]
         [childs (run-query
            (Get (Variable "$childpw")
               (Inheritance (Variable "$childpw") pw)))]
         [jank (format #t "pathway-hierarchy found childs=~A\n" (length childs))]
         [res-child (map
               (lambda (child-pw) (check-pathway child-pw pw lst))
               childs)]
      )
      (append res-parent res-child)
))

(define-public (check-pathway pw parent-pw lst)
   (if (and (member parent-pw lst) (member pw lst))
      (Inheritance pw parent-pw)
   ))

Commentary:

There are two possible performance enhancements to the above code:

As before, I don't think I'll do a pull req for this; because (a) I'm still experimenting (b) I have no way to test. (c) it might be better for you to work out the details as needed.

linas commented 4 years ago

Update: this fixes the guile-stack explosion. Processing has moved forward with 67 of 681 completed in about an hour-and-a-half, with RAM usage holding steady at 3.8GB RSS, 5.3GB virt, guile-heap of 29MB and 25 seconds spent in guile GC.

Atomspace contents look reasonable:

pathway> (cog-report-counts)
((ConceptNode . 454843) (NumberNode . 2) (PredicateNode . 12) (SetLink . 59667) (ListLink . 2045216) (MemberLink . 1857676) (AndLink . 117109) (VariableNode . 14) (VariableList . 5) (GetLink . 41910) (BindLink . 23828) (EvaluationLink . 1960713) (TypeNode . 3) (TypedVariableLink . 10) (ExecutionOutputLink . 6299) (GroundedSchemaNode . 6) (GroundedPredicateNode . 1) (InheritanceLink . 122540) (GeneNode . 49050) (MoleculeNode . 368909))
pathway> (count-all)
7107906

so an additional 300K atoms have been created and added to the atomspace.

linas commented 4 years ago

BTW, I hope that its obvious, from the report above, what the correct fix is. If not, let me know. It's a relatively small pull req.

linas commented 4 years ago

Closing; pull reqs #126 and #127 have been merged.