Closed linas closed 4 years ago
Similar remark for find-name
So I count eight of these so far:
find-pathway-name
find-name
find-entrez
find-GO-ns
find-go-name
find-go-def
find-coding-gene
find-ncrna
OK, so I created an implementation for find-name
in pull req #97 that does the lookup directly. With some minor modifications, it could be used to handle all three cases.
I notice that all three of these are doing very basic EvaluationLink searches, of the form
(Evaluation
(Predicate "foo")
(List
(Concept "bar")
(Concept "baz")))
and that, given foo
and bar
they want to find baz
(or maybe given baz
, they want bar
). Given the austere simplicity of this search, and the fact that it seems to be done a lot(?) it would be best to write some C++ code that performed this lookup directly. I do not yet have any sense of where the bottlenecks are, so I cannot tell if this improves the performance situation. But, from casual perusal of the code, it does seem like these functions might get called a lot.
Anyway, I don't really know what the performance impact of this may or may not be. It is probably worthwhile to do this both ways, and measure. Ideally, I'm thinking that a benchmark for this would be a good idea. See the https://github.com/opencog/benchmark repo. I'll see what I can do.
I'm now trying to understand BindLink/GetLink performance. Measuring the C++ code, while runninggene-go-annotation
I see this:
Things are very different for gene-pathway-annotation
For biogrid-interaction-annotation
I get:
Using the find-name implementation as in #97, I'm measuring (using the previous scheme-based instrumentation code)
Time: 0.8138226 secs. calls: 9801 avg: 83.0 usec/call for find-name
Using the original find-name (based on GetLink) I get:
Time: 3.2294861 secs. calls: 16459 avg: 196.2 usec/call for find-name
so that's kind-of disappointing: the GetLink
version is 2.5x slower than the hand-written version. I was expecting parity, because I was expecting the GetLink ctor to run at 40 usec, and the GetLink exec to run at 60 usec. Why its more than double is unclear.
.. Bizarre .. I measured the time to run the GetLink directly, and it is 70 usec (end-to-end, from where cog-execute!
enters into c++, from guile, until it leaves c++ and returns to guile). The GetLink ctor takes about 30 usec to run ... so where did the other 196-70-30=96 usec go? Right now, I am guessing that the 96 usecs is lost in guile+atomspace, preparing the half-dozen arguments needed to create the GetLink. This seems .. a bit high. And certainly undesirable. Yuck.
Based on further exploration of find-name
, it appears that it is being called over and over again with the same arguments. Never mind. This statement is incorrect, and was due to bad instrumentation on my part.
I'm going to close this as "bad idea"; I think I know where the slowdowns came from, and how to avoid them with syntax-case
.
At any rate, twiddling this does not affect overall performance, this is not where the bottlenecks are.
This is a performance optimization:
find-pathway-name
performs a very simple search,using aUpdate: this previous statement might be completely false. I have not measured. The proposal below and in #97 might be a bad idea. I will try to measure, and get a better feel for what is going on. In the mean time, I'm leaving this issue open.GetLink
. it would almost surely be much faster if it was written out, long-hand.Why might a hand-written version be faster? Because of three things:
Flipside: doing it "by hand", in scheme, requires multiple entries in and out of the guile interpreter, each of which has significant overhead, that might outweigh the losses mentioned above.
To add injury to insult: the SetLink is never removed, so it now clutters up the atomspace forever, slowing down future searches. Also, the GetLink is never removed, which also clutters up the atomspace. Both of these can be avoided by using a temp atomspace (I can provide code for that, it's in the learning repo, but maybe should be moved to the main repo). But, for such simple searches, its best to just avoid using GetLink, and reserve that for complicated searches.