Error while trying to run annotation

Habush commented 4 years ago

I am trying to annotate BRCA1 genes and when I run the annotate-genes function, it fails with the following error:

In procedure cog-outgoing-set: Wrong type argument in position 1 (expecting opencog atom): #<Invalid handle>

The error happens in the run-query function. It looks like the issue is due the pubmed id caching and may it is referring to a delete SetLink (I could be wrong though)

To reproduce the issue, load the datasets and run annotate-genes function with the following arguments

(annotate-genes (list "TSPAN6") "agingSymbols" "[{\"function_name\": \"gene-pathway-annotation\", \"filters\": [{\"filter\": \"pathway\", \"value\": \"smpdb reactome\"},{\"filter\": \"include_prot\", \"value\": \"True\"}, {\"filter\": \"include_sm\", \"value\": \"False\"},{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"}, {\"filter\": \"biogrid\", \"value\": \"0\"}]}, {\"function_name\": \"gene-go-annotation\", \"filters\": [{\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}, {\"filter\": \"parents\", \"value\": \"0\"}, {\"filter\": \"protein\", \"value\": \"True\"}]}, {\"function_name\": \"include-rna\", \"filters\": [{\"filter\": \"coding\", \"value\": \"True\"},{\"filter\": \"noncoding\", \"value\": \"True\"},{\"filter\": \"protein\", \"value\": \"1\"}]},{\"function_name\": \"biogrid-interaction-annotation\", \"filters\": [{\"filter\": \"interaction\", \"value\": \"Proteins\"}]}]")

@linas can you check this please?

linas commented 4 years ago

When I try to run the above, I immediately get the error:

ice-9/boot-9.scm:1655:16: In procedure raise-exception:
In procedure variable-ref: Wrong type argument in position 1 (expecting box): #f

Reading the code, in main,.scm line 95, I see

(func (variable-ref (module-variable (current-module) func-name)))

suggesting that module-variable returned #f presumably because func-name is the empty string. .. but its not. Printing it out, I get gene-pathway-annotation so I manually check:

(module-variable (current-module) 'gene-pathway-annotation) 
$4 = #f

so then I manually say (use-modules (annotation gene-pathway)) and then it pukes on gene-go-annotation ... and down the line.

It appears that the code is failing to include the modules that it needs to resolve all of these symbols.

My manual work-around is to manually load all 10 annotation modules.

This highlights what appears to be an unrelated bug: there's no need to have 10 different modules -- you could just have one, called annotation -- this would make it easier for users to use, and for developers to debug.

linas commented 4 years ago

Re: SetLink -- are you doing anything to delete SetLinks (besides deleting the SetLinks in run-query ? Are you using child atomspaces? Yes, the caching will fail if the cache entry is deleted for some reason...

linas commented 4 years ago

OK, after the above work-arounds (i.e. of manually loading all the needed annotation modules), it completes without error for me (in less than a second).

Please note that I am using the december 2019 version of the datasets; perhaps this bug depends on the datasets? Do I need to get newer datasets? (from where?)

Habush commented 4 years ago

The issue was with the version of datasets I was using. Now it is working.

Btw,

OK, after the above work-arounds (i.e. of manually loading all the needed annotation modules),

I think this should be improved and I will work on a PR. do you have suggestions for better ways of loading specific modules instead of the current-module? may be @rekado can answer this too

Habush commented 4 years ago

Uh, wait..I am trying to annotate the 800+ genes and after few minutes it crashed with the same error. Here is the annotation function I'm calling: annotation.txt

EDIT: 681 genes not 800+

linas commented 4 years ago

modules

I'm just suggesting that, instead of saying (define-module ...) in each distinct file, that instead, just create one file: annotation.scm and then just say

(define-module (annotation))
(include-file "util.scm")
(include-file "function.scm")
(include-file "biogrid.scm")
(include-file "main.scm")
... etc...

linas commented 4 years ago

OK, yes it crashed after 45 minutes. Stack trace:

ice-9/boot-9.scm:1655:16: In procedure raise-exception:
In procedure cog-outgoing-set: Wrong type argument in position 1 (expecting opencog atom): #<Invalid handle>

scheme@(guile-user) [1]> ,bt
In annotation/main.scm:
   119:19  2 (annotate-genes _ _ _)
In ice-9/threads.scm:
   289:22  1 (loop _)
In ice-9/boot-9.scm:
  1655:16  0 (raise-exception _ #:continuable? _)

line 119 of main.scm is [result (par-map (lambda (x) (x)) fns)]

My experience with guile parallel routines is that they are buggy. I don't know why. In this partocular case, I was seeing guile run at 120% CPU, so it got almost no speedup at all (which is very typical). It is often the case that guile parallel routines run slower than single-threaded. There's some lock contention somewhere.

linas commented 4 years ago

Here's the full stack trace from opencog.log

[2020-03-03 22:45:58:337] [ERROR] Backtrace:
In ice-9/threads.scm:
   288:21 19 (loop _)
In annotation/biogrid.scm:
    46:10 18 (biogrid-interaction-annotation _ "agingSymbols" #:interaction _ # _ …)
In srfi/srfi-1.scm:
   673:15 17 (append-map _ _ . _)
   586:29 16 (map1 _)
   586:29 15 (map1 _)
   586:17 14 (map1 ("RBM5" "SLC7A2" "NDUFAB1" "DVL2" "SKAP2" "DHX33" "MSL3" "…" …))
In annotation/biogrid.scm:
    50:35 13 (_ _)
In annotation/util.scm:
    188:8 12 (run-query _)
In unknown file:
          11 (opencog-extension cog-execute! (#))
In ice-9/boot-9.scm:
  1722:10 10 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In unknown file:
           9 (apply-smob/0 #<thunk 7fcd3da69e40>)
In ice-9/boot-9.scm:
  1722:10  8 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In unknown file:
           7 (apply-smob/0 #<thunk 7fcd3da69de0>)
In annotation/functions.scm:
    804:8  6 (generate-result (GeneNode "XRCC3")
 (GeneNode "SNAI1")
 _ # #)
In opencog/base/atom-cache.scm:
    54:34  5 (_ (SetLink
   (GeneNode "SNAI1")
   (GeneNode "XRCC3")
)
)
In annotation/functions.scm:
   772:11  4 (do-find-pubmed-id _)
In annotation/util.scm:
    190:8  3 (run-query _)
In unknown file:
           2 (cog-outgoing-set #<Invalid handle>)
In ice-9/boot-9.scm:
  1655:16  1 (raise-exception _ #:continuable? _)
In unknown file:
           0 (apply-smob/1 #<exception-handler 7fcd3da69dc0> #<&compound-excepti…>)

ERROR: In procedure apply-smob/1:
In procedure cog-outgoing-set: Wrong type argument in position 1 (expecting opencog 
atom): #<Invalid handle>
ABORT: wrong-type-arg
 (/home/ubuntu/src/atomspace/opencog/guile/SchemeEval.cc:1067)
        Stack Trace:
        2: basic_string.h:211   ??()
        3: Logger.cc:589          opencog::Logger::Error::operator()(char const*, ...)
        4: exceptions.cc:55       opencog::StandardException::parse_error_message(char const*, __va_list_tag*, bool)
        5: exceptions.cc:82       opencog::StandardException::parse_error_message(char const*, char const*, __va_list_tag*, bool)
        6: exceptions.cc:142      opencog::RuntimeException::RuntimeException(char const*, char const*, ...)
        7: SchemeEval.cc:1067     opencog::SchemeEval::apply_v(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, opencog::Handle)
        8: shared_ptr_base.h:712        ??()
        9: shared_ptr_base.h:683        ??()
        10: shared_ptr_base.h:975       ??()
        11: Instantiator.cc:622   opencog::Instantiator::instantiate(opencog::Handle
 const&, std::map<opencog::Handle, opencog::Handle, std::less<opencog::Handle>, std:
:allocator<std::pair<opencog::Handle const, opencog::Handle> > > const&, bool)
        12: Implicator.cc:59      opencog::Implicator::grounding(std::map<opencog::Handle, opencog::Handle, std::less<opencog::Handle>, std::allocator<std::pair<opencog::Handle const, opencog::Handle> > > const&, std::map<opencog::Handle, opencog::Handle, std::less<opencog::Handle>, std::allocator<std::pair<opencog::Handle const, opencog::Handle> > > const&)
        13: PatternMatchEngine.cc:2403    opencog::PatternMatchEngine::report_grounding(std::map<opencog::Handle, opencog::Handle, std::less<opencog::Handle>, std::allocator<std::pair<opencog::Handle const, opencog::Handle> > > const&, std::map<opencog::Handle, opencog::Handle, std::less<opencog::Handle>, std::allocator<std::pair<opencog::Handle const, opencog::Handle> > > const&)
        14: PatternMatchEngine.cc:1904    opencog::PatternMatchEngine::do_next_clause()
        15: PatternMatchEngine.cc:1888    opencog::PatternMatchEngine::clause_accept(opencog::Handle const&, opencog::Handle const&)
        16: PatternMatchEngine.cc:1833    opencog::PatternMatchEngine::do_term_up(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        17: PatternMatchEngine.cc:1668    opencog::PatternMatchEngine::explore_singl
e_branch(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, openc
og::Handle const&)
        18: PatternMatchEngine.cc:1330    opencog::PatternMatchEngine::explore_upvar_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        19: PatternMatchEngine.cc:1260    opencog::PatternMatchEngine::explore_up_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        20: PatternMatchEngine.cc:1805    opencog::PatternMatchEngine::do_term_up(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        21: PatternMatchEngine.cc:1668    opencog::PatternMatchEngine::explore_single_branch(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        22: PatternMatchEngine.cc:1330    opencog::PatternMatchEngine::explore_upvar_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        23: PatternMatchEngine.cc:1260    opencog::PatternMatchEngine::explore_up_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        24: PatternMatchEngine.cc:1805    opencog::PatternMatchEngine::do_term_up(st
d::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle 
const&)
        25: PatternMatchEngine.cc:1668    opencog::PatternMatchEngine::explore_single_branch(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        26: PatternMatchEngine.cc:1223    opencog::PatternMatchEngine::explore_term_branches(opencog::Handle const&, opencog::Handle const&, opencog::Handle const&)
        27: PatternMatchEngine.cc:2509    opencog::PatternMatchEngine::explore_clause_direct(opencog::Handle const&, opencog::Handle const&, opencog::Handle const&)
        28: PatternMatchEngine.cc:2622    opencog::PatternMatchEngine::explore_clause(opencog::Handle const&, opencog::Handle const&, opencog::Handle const&)
        29: PatternMatchEngine.cc:1946    opencog::PatternMatchEngine::do_next_clause()
        30: PatternMatchEngine.cc:1888    opencog::PatternMatchEngine::clause_accept(opencog::Handle const&, opencog::Handle const&)
        31: PatternMatchEngine.cc:1833    opencog::PatternMatchEngine::do_term_up(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        32: PatternMatchEngine.cc:1668    opencog::PatternMatchEngine::explore_single_branch(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        33: PatternMatchEngine.cc:1330    opencog::PatternMatchEngine::explore_upvar
_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        34: PatternMatchEngine.cc:1260    opencog::PatternMatchEngine::explore_up_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        35: PatternMatchEngine.cc:1805    opencog::PatternMatchEngine::do_term_up(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        36: PatternMatchEngine.cc:1668    opencog::PatternMatchEngine::explore_single_branch(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        37: PatternMatchEngine.cc:1357    opencog::PatternMatchEngine::explore_upvar_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        38: PatternMatchEngine.cc:1260    opencog::PatternMatchEngine::explore_up_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        39: PatternMatchEngine.cc:1805    opencog::PatternMatchEngine::do_term_up(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        40: PatternMatchEngine.cc:1668    opencog::PatternMatchEngine::explore_singl
e_branch(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, openc
og::Handle const&)
        41: PatternMatchEngine.cc:1223    opencog::PatternMatchEngine::explore_term_branches(opencog::Handle const&, opencog::Handle const&, opencog::Handle const&)
        42: PatternMatchEngine.cc:2509    opencog::PatternMatchEngine::explore_clause_direct(opencog::Handle const&, opencog::Handle const&, opencog::Handle const&)
        43: PatternMatchEngine.cc:2622    opencog::PatternMatchEngine::explore_clause(opencog::Handle const&, opencog::Handle const&, opencog::Handle const&)
        44: PatternMatchEngine.cc:1946    opencog::PatternMatchEngine::do_next_clause()
        45: PatternMatchEngine.cc:1888    opencog::PatternMatchEngine::clause_accept(opencog::Handle const&, opencog::Handle const&)
        46: PatternMatchEngine.cc:1833    opencog::PatternMatchEngine::do_term_up(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        47: PatternMatchEngine.cc:1668    opencog::PatternMatchEngine::explore_single_branch(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        48: PatternMatchEngine.cc:1330    opencog::PatternMatchEngine::explore_upvar_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)
        49: PatternMatchEngine.cc:1260    opencog::PatternMatchEngine::explore_up_branches(std::shared_ptr<opencog::PatternTerm> const&, opencog::Handle const&, opencog::Handle const&)

linas commented 4 years ago

Without the par-map, it ran for a while, then crashed with

GC Warning: Repeated allocation of very large block (appr. size 4227862528):
    May lead to memory leak and poor performance
Aborted

That's a 4-gigabyte block .... I'm assuming that this is json-related, because, without json, total RAM usage never got above 5GB, and guile usage never got above 200MBytes .. but I dunno, will try again, shortly.

Habush commented 4 years ago

I removed the par-map and the json code and it crashed with the following error after 37 minutes:

ERROR: In procedure concatenate:
In procedure append: Wrong type argument in position 1 (expecting empty list)

I think this is b/c I replaced par-map with append-map. But it looks like it has finished the annotation as I see the scheme files are written and from their size (e.g the pathway output file is 1.8GB) I guess it did the whole annotation

rekado commented 4 years ago

Replace it with map instead. append-map expects the procedure to return a list for each element, so that all the result lists can be appended in the end.

linas commented 4 years ago

I conclude that the caching/memoization functions are not thread-safe; I will look at them and try to fix shortly.

Re: map and file-writing: I would strongly recommend that this be done in some incremental fashion. i.e. rather than calling append-map in 101 different places in the code, and then passing around large return values, instead, create a function called report-result and call it whenever a result is obtained. From here, you could then either write to a file, or run it through json, or run it into some other database or API, or whatever. .. so that report-result is a callback that is called whenever some annotation is completed.

FWIW, last time I looked, about 20% of total CPU time was spent writing the file.

linas commented 4 years ago

OK, if I stub out the json-printing at the end, everything completes. The file generation looks like so (gc-stats)

At end of search: (heap-size . 7639584768) (heap-free-size . 7435337728) diff=200MBytes
Then (format #f "~a" results) takes 270 seconds, blows up RAM: (heap-size . 28142227456) (heap-free-size . 19253534720) so guile is now using 10GB more, and I got error: GC Warning: Repeated allocation of very large block (appr. size 1056968704): also RSS blow up to 31.1GBytes from 9.4GBytes.
Next, running atomese-parser on the above string results in the abort. I forgot to measure the length of the string ...

linas commented 4 years ago

OK, after making numerous changes to the code, trying to improve performance (I've got it running maybe 4x faster? now) I did hit this, rather unexpectedly:

 # …))
   586:29  8 (map1 ((ListLink
   (GeneNode "CDR2L")
   (GeneNode "ELAVL1")
)
 …))
   586:17  7 (map1 ((ListLink
   (GeneNode "JUP")
   (GeneNode "DBN1")
)
 # # …))
In annotation/functions.scm:
    632:2  6 (_ _)
    893:8  5 (xgenerate-result (GeneNode "JUP")
 (GeneNode "DBN1")
 #t ("") 0 #f …)
In opencog/base/atom-cache.scm:
    55:32  4 (_ (SetLink
   (GeneNode "JUP")
   (GeneNode "DBN1")
)
)
In unknown file:
           3 (hashx-set! #<procedure atom-hash (ATOM SZ)> #<procedure atom-ass…> …)
In opencog/base/atom-cache.scm:
    47:44  2 (atom-hash #<Invalid handle> 224717)
In unknown file:
           1 (cog-handle #<Invalid handle>)
In ice-9/boot-9.scm:
  1655:16  0 (raise-exception _ #:continuable? _)
scheme@(guile-user) [1]>

what's weird about the above is that more than 1200 of these map1 things were printed, scrolling off the top of the screen. Typing in ,up indicates that each one is a stack frame. I'm flabbergasted that the guile stack is over 1200 frames deep at this point. I'm quite confused about this... this implies out-of-control recursion Where's that recursion? Why isn't it tail-recursive?

Maybe this is how srfi-1 map actually works? There were 2861 frames, all but the last few were map1 and looked exactly alike (except for different gene names in them)...

... Above crash happens in exactly the same place each time, exactly the same stack trace. Five out of five tries.

Adding a mutex does not fix it.

linas commented 4 years ago

@rekado so -- any clue about guile's implementation of srfi-1 map? The above stack trace makes it look like its not tail recursive. Maybe I'm misdreading, or maybe it doesn't matter, or maybe there's actually just some bizarre stale bug in guile's version of srfi-1 ???

rekado commented 4 years ago

* Then `(format #f "~a" results)` takes 270 seconds, blows up RAM: `(heap-size . 28142227456) (heap-free-size . 19253534720)` so guile is now using 10GB more, and I got error: `GC Warning: Repeated allocation of very large block (appr. size 1056968704):` also RSS blow up to 31.1GBytes from 9.4GBytes.

Yes, the format call is not great. The new atomese-parser would work fine on a port, but the code converts things to strings and then reads from the string again, which is unnecessary. The code should use the port interface directly.

rekado commented 4 years ago

Guile's version of srfi-1 is a bit older, but even the latest version of the reference implementation of srfi-1 piles up conses because the second argument to the cons is the recursive call. So the latest reference implementation is not tail recursive. Neither is the implementation in Guile (no matter if that's the one from boot-9 or srfi-1).

I think that's because map is actually map-in-order. I don't see how one could write a tail-recursive map that processes its arguments from left to right. I can only think of one that reverses the result list.

rekado commented 4 years ago

I'm not sure of what the goal is here because I haven't been following the discussion closely. Is a map really needed here? Or can we fold instead?

rekado commented 4 years ago

FWIW, the implementation of map in (rnrs base (6)) does appear to be tail recursive (the definition is a bit hard to read). It uses reverse at the end to restore order. So maybe that's another thing to try -- if processing order doesn't matter.

linas commented 4 years ago

The root cause of the assert is this: the cache contains the atom (Set (GeneNode "HEXIM1") (GeneNode "HNRNPL")) but this atom just happens to be the same as the return of an earlier query, and has been deleted. Thus, the hash table is hanging on to a bad reference, and throws the exception when accessed. I am pondering a good fix right now.

linas commented 4 years ago

@rekado for this code, the order of the map doesn't matter. But also (for this code) the amount of CPU-time in map is miniscule, so chainging/fixing this is not important. I was just ... surprised ... given how SICP and other texts explain tail recursion in chapter-one, that something as basic as srfi-1 map would not be tail-recursive. Interesting; thanks for the notes.

linas commented 4 years ago

@Habush this is fixed in opencog/atomspace#2525

Habush commented 4 years ago

@linas does this also fix the the thread-safety issue of the caches?

linas commented 4 years ago

does this also fix the the thread-safety issue of the caches?

No ... but I have no clear evidence that the caches are not thread-safe. I assumed that was the problem, at first, but don't really know if that is a problem. The error message #<invalid-handle> is completely explained by it being a deleted Atom, whereas, if the guile hash functions weren't thread-safe, you'd see corruption or crashes of some kind. So, for now, I don't think thread-safety is an issue...

linas commented 4 years ago

p.s. regarding threading: it provides very little speedup; the huge impact on perf is from caching assorted search results, viz. the other open pull reqs.

linas commented 4 years ago

oh. Hmm. Just ran it with par-map, and hit #<Invalid handle> w/

In annotation/main.scm:
    72:42  4 (_ "NME1-NME2")
In opencog/base/atom-cache.scm:
    63:34  3 (_ (GeneNode "NME1-NME2")
)
In annotation/util.scm:
    190:8  2 (do-locate-node _)
In unknown file:
           1 (cog-outgoing-set #<Invalid handle>)
In ice-9/boot-9.scm:
  1655:16  0 (raise-exception _ #:continuable? _)

and again, in a different place:

In annotation/main.scm:
    71:32  7 (_ "RABL5")
In annotation/util.scm:
    82:14  6 (node-info _)
In opencog/base/atom-cache.scm:
    63:34  5 (_ (GeneNode "RABL5")
)
In annotation/util.scm:
    63:27  4 (do-get-node-info (GeneNode "RABL5")
)
In opencog/base/atom-cache.scm:
    63:34  3 (_ (GeneNode "RABL5")
)
In annotation/util.scm:
    175:8  2 (do-find-pathway-name _)
In unknown file:
           1 (cog-outgoing-set #<Invalid handle>)
In ice-9/boot-9.scm:
  1655:16  0 (raise-exception _ #:continuable? _)

so maybe there is a threading race, after all. Will examine tomorrow.

linas commented 4 years ago

Using par-map in main.scm, I saw at least four exceptions of the form above, each for different GeneNodes, thus suggesting a race condition. I do not understand why the catch handler did not catch the exceptions. I added a mutex to afunc for the form below:

   (define mtx (make-mutex))
   (lambda (ITEM)
      (lock-mutex mtx)
      (let* ((val (hashx-ref atom-hash atom-assoc cache ITEM))
            (rv (if val val
                  (let ((fv (AFUNC ITEM)))
                     (hashx-set! atom-hash atom-assoc cache ITEM fv)
                     fv))))
         (unlock-mutex mtx)
         rv))

but still got an exception. However, this time the exception was not in afunc but was instead this:


ice-9/boot-9.scm:1655:16: In procedure raise-exception:
In procedure cog-outgoing-set: Wrong type argument in position 1 (expecting opencog atom): #<Invalid handle>

In ice-9/boot-9.scm:
   2792:4  5 (save-module-excursion _)
  4336:12  4 (_)
In annotate-all.scm:
    695:8  3 (run-all)
In annotation/main.scm:
   119:19  2 (annotate-genes _ _ _)
In ice-9/threads.scm:
   289:22  1 (loop _)
In ice-9/boot-9.scm:
  1655:16  0 (raise-exception _ #:continuable? _)

line 118 of main.scm is

           [result (par-map (lambda (x) (x)) fns)] )

I don't understand where the cog-outgoing-set is ... it must be in one of the fns ...

Ohh.. I see .. its in run-query -- one thread is deleting the SetLink that another thread is looking at. So run-query itself is not thread-safe.

linas commented 4 years ago

OK, I got par-map to work with the following two patches. It does reduce the total elapsed time to obtain an answer, but it splurges on CPU time to do it -- it provides a 1.5x speedup in exchange for 2.6x more CPU time. Detailed performance measurements in https://github.com/MOZI-AI/annotation-scheme/issues/141#issuecomment-596038145

The patches: one for the atomspace:

--- a/opencog/scm/opencog/base/atom-cache.scm
+++ b/opencog/scm/opencog/base/atom-cache.scm
@@ -56,13 +56,17 @@
 "
        ; Define the local hash table we will use.
        (define cache (make-hash-table))
+       (define mtx (make-mutex))

        (lambda (ITEM)
-               (define val (hashx-ref atom-hash atom-assoc cache ITEM))
-               (if val val
-                       (let ((fv (AFUNC ITEM)))
-                               (hashx-set! atom-hash atom-assoc cache ITEM fv)
-                               fv)))
+               (define (do-memoize)
+                       (define val (hashx-ref atom-hash atom-assoc cache ITEM))
+                       (if val val
+                               (let ((fv (AFUNC ITEM)))
+                                       (hashx-set! atom-hash atom-assoc cache ITEM fv)
+                                       fv)))
+
+               (with-mutex mtx (do-memoize)))
 )

 ; ---------------------------------------------------------------------

and one locally:

--- a/annotation/util.scm
+++ b/annotation/util.scm
@@ -32,6 +32,7 @@
        #:use-module (ice-9 regex)
        #:use-module (srfi srfi-1)
        #:use-module (ice-9 match)
+       #:use-module (ice-9 threads)
        #:export (create-node
                  create-edge)
 )
@@ -164,6 +165,7 @@
       ((single) single)
       ((first second . rest) second))))

+(define run-query-mtx (make-mutex))
 (define-public (run-query QUERY)
 "
   Call (cog-execute! QUERY), return results, delete the SetLink.
@@ -171,12 +173,21 @@
 "
        ; Run the query
        (define set-link (cog-execute! QUERY))
-       ; Get the query results
-       (define results (cog-outgoing-set set-link))
-       ; Delete the SetLink
-       (cog-delete set-link)
-       ; Return the results.
-       results
+
+       (lock-mutex run-query-mtx)
+       (if (cog-atom? set-link)
+               ; Get the query results
+               (let ((results (cog-outgoing-set set-link)))
+                       ; Delete the SetLink
+                       (cog-delete set-link)
+                       (unlock-mutex run-query-mtx)
+                       ; Return the results.
+                       results)
+               ; Try again
+               (begin
+                       (unlock-mutex run-query-mtx)
+                       (run-query QUERY))
+       )
 )

 (define (do-find-name GO-ATOM)

BTW, there is a bug w/ the makefile: annotation makefile doesn't realize that the atomspace scheme files have changed, and so doesn't recompile, and ends up using the old version of the atomspace. You have to explicitly rm -r build and also rm -r ~/.cache/guile to get the new code to get used.

Habush commented 4 years ago

@linas After merging PR #156, the annotation fails with the following error:

In procedure append: Wrong type argument in position 2 (expecting empty list): #<unspecified>.

The stacktrace points to this line in the code as source of error.

I am running the following annotation function:

(annotate-genes (list "TSPAN6" "NDUFAF7" "RBM5" "SLC7A2" "NDUFAB1" "DVL2" "SKAP2" "DHX33" "MSL3" "BZRAP1" "GTF2IRD1" "IL32" "RPS20" "SCMH1" "CLCN6" "RNF14" "ATP2C1" "IGF1" "GLRX2" "FAS" "ATP6V0A1" "FBXO42" "JADE2" "PREX2" "NOP16" "LMO3" "R3HDM1" "ERCC8" "HOMER3" "USE1" "OPN3" "SZRD1" "ATG5" "CAMK2B" "MPC1" "MRPS24" "ZNF275" "TAF2" "TAF11" "IPO5" "NDUFB4" "DIP2B" "MPPED2" "IARS2" "ERLEC1" "UFD1L" "PDCD2" "ACADVL" "ENO1" "FRYL" "SEC31B" "KIFAP3" "NT5C2" "GPC4" "ITGA8" "PPP2R5C" "RBFOX1" "ITM2A" "NRD1" "VDAC3" "CBFA2T2" "FKBP7" "SAR1A" "DUSP13" "PGR" "EPB41L3" "OXCT1" "SLC27A5" "WBP11" "NCOA1" "MAPRE3" "MGST2" "DIMT1" "RBM22" "TMED2" "HUWE1" "NLK" "UIMC1" "GNAS" "COQ9" "NSFL1C" "CFAP61" "TASP1" "MRPS33" "NDUFB2" "TXNL1" "MYL6" "HDAC6" "DHPS" "CREM" "PSMD8" "CIRBP" "HNRNPM" "SF3A1" "POLR2F" "HMGXB4" "CHKB" "ZMAT5" "RBM23" "VTI1B" "TIMM9" "GSTZ1" "RPS6KA5" "PSMB5" "RAB5IF" "PFDN4" "PSMA7" "NDUFAF5" "ATP1B4" "ALG13" "SUV39H1" "SCML2" "PGK1" "KLF5" "TSC22D1" "MGRN1" "SLC7A6" "CMC2" "CPPED1" "MYEF2" "CPQ" "LEPROTL1" "PPP2CB" "KLHDC4" "POP4" "AKT2" "RABAC1" "CARD8" "PON2" "SSBP1" "BUD31" "MEST" "CHCHD3" "COA1" "BLVRA" "PLGRKT" "BAG1" "EXOSC3" "RASSF4" "KAZALD1" "PITRM1" "EBF3" "LGI1" "MTMR4" "CDK5RAP3" "ENO3" "ICAM2" "EZH1" "MRPL27" "HDAC5" "DUSP3" "DCUN1D4" "NDUFC1" "ZBTB16" "COMMD9" "ATP5B" "ELK3" "ALDH2" "STX2" "GPR133" "MRPL51" "GAPDH" "TPI1" "TMEM14C" "GCNT2" "NCOA7" "FANCE" "E2F3" "ACOT13" "COX7A2" "ENPP5" "PCDHB2" "TMCO6" "TTC1" "POLR3G" "BNIP1" "TIMMDC1" "BCL6" "FAM162A" "PRKAR2A" "TUSC2" "SSR3" "MOB1A" "NCL" "MPV17" "HSPE1" "ZNF142" "ID2" "PNO1" "TMEM59" "LAMTOR2" "LEPR" "CTH" "TMEM9" "MRPS15" "SDHB" "FAAH" "DPH5" "C1orf54" "ANKRD13C" "VAMP8" "NDUFB3" "GDA" "MAPKAP1" "YPEL5" "C10orf76" "RCL1" "GRIA2" "PCMT1" "KIAA1217" "PAIP2" "ARL1" "SOCS2" "ECHDC2" "A1BG" "ZNF211" "GTDC1" "CCDC18" "HNRNPA2B1" "FAM126A" "CLTA" "CISD1" "CDKN2C" "RASSF8" "ATP7B" "ITIH5" "SMUG1" "NDUFAF4" "PLA2G12A" "PFKFB2" "ATP5E" "STAMBP" "SNRNP27" "NQO2" "EMC3" "TMTC4" "SNRPD2" "ID1" "KDM5C" "ST3GAL3" "EMC1" "UQCR11" "RNF6" "SHFM1" "STEAP4" "RBM48" "TUBGCP6" "EMC4" "RABL5" "MTX2" "TXNDC17" "DAD1" "ECSIT" "CDC16" "RPL36" "MRPL34" "LSM4" "CYP2E1" "ZNF337" "PRRC2B" "COX4I1" "NFATC1" "PDLIM4" "PSME3" "NDUFA2" "RAF1" "ENOSF1" "MRPL35" "SERPINF1" "GRSF1" "WBP2" "PRMT7" "POMP" "MYH8" "BEX2" "ACTR3B" "ARL8B" "EMC7" "LAMTOR5" "KCTD1" "DDB2" "PUM1" "NREP" "CTSL" "FAM189A2" "MDFIC" "CAPRIN1" "CD63" "TSPAN31" "MRPL44" "NDUFB5" "TANK" "VPS45" "PSMB7" "UBAP2" "GRHPR" "BPHL" "UQCC2" "NUMA1" "DCUN1D5" "UNC13C" "TUBGCP4" "SLC28A2" "CYP1B1" "COX17" "PARP16" "HERC5" "PPA2" "LGR5" "NEDD1" "SLAIN1" "GRTP1" "TMX1" "SERF2" "SRP14" "WDR61" "TPM1" "SEC11A" "PMM2" "MYLK3" "CLTC" "GAREM" "GREB1L" "RNF165" "TXNL4A" "NFIC" "PSMB6" "PSMA5" "CELSR2" "TIPRL" "UFC1" "SETDB1" "ADAMTSL4" "JTB" "HAX1" "ACP1" "NVL" "DEGS1" "PSEN2" "PDIA6" "SFXN5" "ZNF385B" "UBR3" "GULP1" "ATG3" "SRPRB" "TMEM108" "SLIT2" "DDIT4L" "PAM" "TSLP" "ATG12" "BOD1" "MYLK4" "DYNLT1" "PSPH" "ATXN7L1" "ZMYM3" "RADX" "ARHGAP36" "HMBOX1" "PXDNL" "GOLGA7" "POLR2K" "EIF3H" "NDUFB9" "TATDN1" "C9orf3" "FAM171A1" "FAM188A" "GSTO1" "EIF3M" "ARFGAP2" "DAK" "CPSF7" "CABLES2" "SAP18" "HNMT" "TIMM8B" "PTS" "NDUFC2" "BICD1" "ZNF385D" "GEMIN6" "ATP5A1" "CCDC50" "UTRN" "ZNF117" "RASSF3" "HNRNPU" "TRIP12" "LGI4" "MSI2" "UCHL1" "ATP5G3" "TCEB1" "PSMA8" "DPH3" "ACSS1" "TMEM55A" "GOLGA7B" "CNOT8" "NUP205" "C9orf85" "CARNMT1" "KCNMA1" "SCAF4" "MALSU1" "PDE6D" "MUM1L1" "AFAP1L1" "WDR19" "MRPL17" "CNNM4" "ZFAND2B" "AGPAT6" "SNF8" "CBR1" "TPPP3" "CCDC28B" "SSU72" "PCNT" "PCYT1A" "COX7A1" "BCL6B" "POLR3K" "NXF1" "OLFML2B" "MRPL55" "RFTN2" "VSNL1" "RPRD2" "BOLA3" "MRPS18C" "ARPC2" "SUCLG1" "PPM1L" "ADAMTS9" "ELP6" "SMIM12" "SFMBT1" "RAD54L2" "HPGD" "ARFIP1" "UQCRQ" "HCN1" "IQUB" "SUN1" "FAM86FP" "DCAF13" "C9orf89" "NDUFB6" "CFL2" "KLHDC2" "TRUB1" "ZFYVE1" "PSMC3" "DPCD" "ALKBH3" "CCT2" "ZNF202" "USP54" "C11orf74" "NDST2" "SEC11C" "CENPV" "HSP90B1" "AP1G1" "PPIB" "FAM96A" "SMAD3" "PDIA3" "FBXO22" "ATP5L" "GPX4" "ATP5H" "POLR2G" "COPS6" "RAB4A" "DNAJC7" "COA6" "MMADHC" "MPLKIP" "HEXIM2" "COL3A1" "CNNM3" "USP39" "RNF181" "LRRC28" "PPIC" "STXBP6" "MFF" "ATP5I" "PARM1" "NSMCE1" "EFNA1" "CRADD" "SIN3A" "CNBP" "ZNF32" "TMEM266" "TOR1AIP2" "BRD3" "SF3B5" "STX8" "UBB" "DNAJC18" "NUDCD2" "GPR27" "KBTBD2" "TRIAP1" "MTSS1" "ZNF160" "FEZ2" "FAM86JP" "RBKS" "EXOSC1" "PDE7B" "MRPL36" "BPTF" "ZNF540" "EXOSC10" "C20orf196" "FRMD3" "TP53RK" "SYNPO2" "MYEOV2" "MRPL52" "TMEM134" "BNC2" "KDM2A" "RHOD" "COMMD1" "GLRX" "DMRT2" "FAM86B3P" "MINOS1" "UQCRH" "PHC3" "USMG5" "HOXB2" "NRROS" "MSRB3" "MIR1-1HG-AS1" "SNAPC5" "C12orf76" "MRPL11" "IL20RB" "AKIRIN1" "TMEM167A" "NDUFA11" "FAM211A-AS1" "CADM2" "LSM1" "TMEM9B" "SCUBE2" "UCP3" "PAAF1" "MRPL48" "LINC00116" "RTTN" "KCMF1" "RMDN1" "IRX5" "C8orf4" "MAMSTR" "POLE" "FAM210A" "ATOX1" "KIAA0195" "MLF1" "GRAMD1C" "BOLA1" "TMEM11" "RIIAD1" "SEPW1" "MAGED1" "FCER1A" "PSMG4" "SSR4" "TRAPPC5" "PLAG1" "LSM10" "COA4" "NEUROG1" "MRPS11" "MRPS16" "FAM104B" "SATB1" "NDN" "RP11-181G12.2" "CNOT10" "ZNF662" "LSMD1" "CEP57L1" "SFXN4" "FAM162B" "DENND5A" "UBE2F" "PCDH9" "MROH7" "NDUFA12" "APOO" "PTRHD1" "RPS27L" "ADSSL1" "UBALD2" "ROR1" "NR2F2" "PSMD13" "ANKFY1" "ZNF529" "POLR1D" "TOR3A" "PPP1CC" "RGS9BP" "KRT10" "GPATCH8" "RPS19BP1" "CMC1" "MAGI2" "EXD3" "MORN2" "COL4A5" "COMMD6" "S100A16" "LINC00319" "RNFT1" "HN1" "C15orf61" "BLOC1S2" "LCOR" "XPNPEP3" "AC005154.6" "ACN9" "TECPR2" "TOMM7" "ADA" "ZNF252P" "NOP9" "CASP4" "METTL9" "ZNF398" "ZNF682" "MRPL21" "HTT" "COL4A6" "PHF2" "FAM118B" "FAM49A" "MRPL42" "UBL5" "CCDC69" "NCOA6" "MT-ND5" "ZNF521" "MT-ND3" "SELT" "OSTC" "TSEN15" "MT-ND4" "DCLRE1A" "CCDC167" "DMD" "SNORA63" "SNORD51" "TATDN3" "FAM229B" "INPP5B" "COL15A1" "ZBTB48" "ZNF783" "C12orf73" "FLJ00104" "SARNP" "DNAJC19" "HCG27" "SNORA12" "SNORD70" "U3" "SNORD91B" "U3" "SUPT4H1" "ANKRD39" "ATP6V0CP3" "NDUFS3" "ZNF337-AS1" "SLC23A3" "ARL16" "ZSWIM7" "COL28A1" "SLC35E2" "TENM3" "FAM19A5" "RPA3-AS1" "SNORD99" "PSMD4P1" "MIR503HG" "PRKAR2A-AS1" "CEP57L1P1" "LINC01167" "CETN4P" "LINC00378" "RAB6C-AS1" "TPI1P1" "LINC00354" "PCGEM1" "OST4" "MSX2P1" "EEF1DP3" "HSBP1" "TRAF3IP2-AS1" "MCTS1" "SEC1P" "DICER1-AS1" "CDKN2AIPNL" "KCND3-AS1" "FAM200B" "GOLGA2P5" "RPL13P5" "ADAMTS9-AS1" "LINC00882" "MRPL33" "STMP1" "NME1-NME2" "UBE2V1" "RASSF8-AS1" "SOCS2-AS1" "OIP5-AS1" "TMEM161B-AS1" "UBA6-AS1" "SMIM20" "ROPN1L-AS1" "SNORA70" "GS1-251I9.4" "IFNG-AS1" "AGAP2-AS1" "APOPT1" "CUX1" "LINC00592" "UBE2F-SCLY" "LINC00662" "KCNJ2-AS1" "GOLGA7B" "NCBP2-AS2" ) "agingSwitchSymbols" "[{\"function_name\": \"gene-pathway-annotation\", \"filters\": [{\"filter\": \"pathway\", \"value\": \"smpdb reactome\"},{\"filter\": \"include_prot\", \"value\": \"False\"}, {\"filter\": \"include_sm\", \"value\": \"True\"},{\"filter\": \"coding\", \"value\": \"False\"},{\"filter\": \"noncoding\", \"value\": \"True\"}, {\"filter\": \"biogrid\", \"value\": \"1\"}]}, {\"function_name\": \"gene-go-annotation\", \"filters\": [{\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}, {\"filter\": \"parents\", \"value\": \"0\"}, {\"filter\": \"protein\", \"value\": \"False\"}]}, {\"function_name\": \"include-rna\", \"filters\": [{\"filter\": \"coding\", \"value\": \"False\"},{\"filter\": \"noncoding\", \"value\": \"True\"},{\"filter\": \"protein\", \"value\": \"0\"}]},{\"function_name\": \"biogrid-interaction-annotation\", \"filters\": [{\"filter\": \"interaction\", \"value\": \"Genes\"}]}]")

linas commented 4 years ago

Is this new failure with or without the par-map? If with, can you try without? Or rather, par-map should just be removed ... it costs more than it delivers ...

Is this the same list of genes as before, or different? Same annotation function as before, or different?

Habush commented 4 years ago

Actually the issue is with the condition at https://github.com/MOZI-AI/annotation-scheme/blob/cc9f0f952af3ee7fe07beb07c370346f62837fa8/annotation/gene-pathway.scm#L112

It should return an empty list when prot? is false.

Habush commented 4 years ago

the biogrid-interaction-annotation function is also not working. You can test it even with a single gene. I set the interaction argument to Genes

linas commented 4 years ago

Actually the issue is with the condition at

Ah, OK .. well ... umm... patch it then, don't wait for me to do it...

Habush commented 4 years ago

I’m just letting you know :). I will send a PR with the fixes

Habush commented 4 years ago

As an update, I was able to annotate 745 genes using a custom atomspace. And it took around 4 minutes.

However, now I am facing an issue trying to annotate gene IGF1 using the sample dataset (tests/sample_dataset.scm).

After loading the sample dataset, I run the annotate-genes function as follows:

(annotate-genes (list "IGF1") "igf1" "[{\"function_name\": \"gene-pathway-annotation\", \"filters\": [{\"filter\": \"pathway\", \"value\": \"smpdb reactome\"},{\"filter\": \"include_prot\", \"value\": \"False\"}, {\"filter\": \"include_sm\", \"value\": \"True\"},{\"filter\": \"coding\", \"value\": \"False\"},{\"filter\": \"noncoding\", \"value\": \"True\"}, {\"filter\": \"biogrid\", \"value\": \"1\"},{\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}]}, {\"function_name\": \"gene-go-annotation\", \"filters\": [{\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}, {\"filter\": \"parents\", \"value\": \"0\"}, {\"filter\": \"protein\", \"value\": \"False\"}]}, {\"function_name\": \"include-rna\", \"filters\": [{\"filter\": \"coding\", \"value\": \"False\"},{\"filter\": \"noncoding\", \"value\": \"True\"},{\"filter\": \"protein\", \"value\": \"0\"}]},{\"function_name\": \"biogrid-interaction-annotation\", \"filters\": [{\"filter\": \"interaction\", \"value\": \"Genes\"}, {\"filter\": \"namespace\", \"value\": \"biological_process cellular_component molecular_function\"}]}]")

This fails with the following error:

In procedure cog-outgoing-set: Wrong type argument in position 1 (expecting opencog atom): #<Invalid handle>

I have attached the full stack trace. stacktrace.txt

P.S I have rebuilt the docker image from scratch so that the latest atomspace version is used for annotation

linas commented 4 years ago

Is this with par-map in main.scm, or with regular map?

Habush commented 4 years ago

@linas it is with par-map

linas commented 4 years ago

Please read everything I wrote above about par-map. I wrote a lot about it. To be clear: I strongly advise against using par-map in main.scm.

Habush commented 4 years ago

OK, I got par-map to work with the following two patches. It does reduce the total elapsed time to obtain an answer, but it splurges on CPU time to do it -- it provides a 1.5x speedup in exchange for 2.6x more CPU time.

I thought with changes you made to the atomspace, the issue with par-map was fixed, no?

linas commented 4 years ago

I made no changes at all to the atomspace; I strongly recommend against using par-map. It is expensive: it gives you a 1.5x elapsed-time speed-up for 2.6x more CPU usage -- that's rather inefficient. If you really want to do this anyway, to reduce elapsed time, then I'm mostly thinking that the locks belong in the annotation code...

linas commented 4 years ago

Independent of this particular issue, there are two generic problems I'm seeing with guile. Maybe @rekado can impart words of wisdom?

par-map and par-for-each when used with the atomspace are always very inefficient, or worse. In the learn project, I was seeing 1.5x speedup for 2 cpus, 1.2x speedup for 3 cpus, a slight slowdown maybe 0,.9x for 4, and a massive slowdown/deadlock for anything more. I don't understand why, because simple guile multi-thread experiments don't have this problem, and multi-threaded atomspace doesn't have this problem. I'm guessing it has to do with the very high-frequency cross-overs guile->c++ and back. Some kind of lock contention, there.
In this code base -- annotation-scheme -- I'm seeing 50% of CPU time in GC, when running single-threaded. This seems like way too much. It might be due to all of the lists and append-maps and etc. (In the learn project, I was seeing 5% or 10% I think, not sure, and less than that in the benchmarks, but was not really payig attention there.) I don't know why the GC is so high, and so don't know how to reduce it.

I only see the high GC use when annotating the 681 genes -- for the single IGF1 gene, I see this:

Elapsed: 75.208 secs. Rate: 55.85 gc/min %cpu-GC: 6.232%  %cpu-use: 101.3%

I'm guessing that, for the 681 genes, the code creates extremely long lists, and using append-map causes those lists to be thrashed. I'm thinking that map and append-map need to be replaced by fold... if so, this suggests another 2x speedup might be possible!

linas commented 4 years ago

OK, so @Habush pull req #161 adds all the locks that you need. No changes to the atomspace needed.

linas commented 4 years ago

Closing because I think #161 fixes this. Open new report if there are remaining issues.

MOZI-AI / annotation-scheme

Error while trying to run annotation #144