MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

Performance diary, continued. #141

Open linas opened 4 years ago

linas commented 4 years ago

This is not an issue report; rather, its a performance diary, tracking performance from where #98 left off.

Instrumented code to be measured as follows:

git checkout master
git pull
git checkout -b perf
git pull https://github.com/MOZI-AI/annotation-scheme.git measure
git commit
autoreconf -vif
mkdir build; cd build
../configure
make
sudo make install

Then, start guile, ideally w/ hugetlb enabled:

$ HUGETLB_MORECORE=yes LD_PRELOAD=/usr/lib/libhugetlbfs.so.0 guile

Then start cogserver:

guile> (use-modules (opencog cogserver))
guile> (start-cogserver)

Then load your dataset, and run whatever job you wish to profile. Meanwhile, in another terminal:

$ rlwrap telnet localhost 17001
guile> (report)

This will print out the current profile of the currently-running code.

Alternately, if you don't have the cogserver, then just wait for your job to finish, and then run (report) at the guile REPL prompt.

(Note to self: for me, do this: $cd work/src; (load "annotate-all.scm"))

linas commented 4 years ago

Running the unit test from #144 I get this:

biogrid-interaction-annotation - 5294 secs -- find-output-interactors - 4899 secs -- pattern -- generate-result - 4855 secs -- find-go-term - 67 secs -- find-protein-form - 4515 secs

linas commented 4 years ago

Unit test from #144, after assorted fixes: 20.4 gb virt, 4.4 gb RSS, 43 minutes elapsed (2620 seconds) which includes time to load dataset.... which was 598 seconds. gc-stats is: (heap-size . 7658577920) (heap-free-size . 7396917248) so 250MBytes of RAM used by scheme. The report:

Pathway functions
Time: 521.29256 secs. calls: 1 avg: 521292555.8 usec/call for gene-path-anno
Time: 115.86134 secs. calls: 1 avg: 115861342.7 usec/call for gene-path-write
Time: 172.32179 secs. calls: 681 avg: 253042.3 usec/call for smpdb
Time: 342.50595 secs. calls: 681 avg: 502945.6 usec/call for reactome
Time: 345.16271 secs. calls: 3769 avg:  91579.4 usec/call for find-pathway-genes
Time: 181.36376 secs. calls: 183670 avg:    987.4 usec/call for add-pathway-genes
Time: 164.07672 secs. calls: 3769 avg:  43533.2 usec/call for find-mol
Time: 4.8064842 secs. calls: 1362 avg:   3529.0 usec/call for find-pathway-member
Time: 0.6353414 secs. calls: 2197 avg:    289.2 usec/call for pathway-hierarchy
Time: 5.6786770 secs. calls: 1362 avg:   4169.4 usec/call for find-protein
Time: 54.434262 secs. calls: 153412 avg:    354.8 usec/call for add-loc
Zero calls to pathway-gene-interactors
Zero calls to generate-interactors

Common biogrid/pathway functions
Time: 2.0903343 secs. calls: 1549 avg:   1349.5 usec/call for find-name
Time: 23.860827 secs. calls: 18187 avg:   1312.0 usec/call for locate-node
Zero calls to find-parent

Biogrid functions
Time: 544.60710 secs. calls: 1 avg: 544607103.2 usec/call for biogrid-anno
Time: 16.548887 secs. calls: 1 avg: 16548887.1 usec/call for biogrid-write
Time: 55.154352 secs. calls: 681 avg:  80990.2 usec/call for match-gene-interactors
Time: 488.67459 secs. calls: 681 avg: 717583.8 usec/call for find-output-interactors
Time: 282.23120 secs. calls: 540000 avg:    522.7 usec/call for generate-result
Time: 52.450733 secs. calls: 1080000 avg:     48.6 usec/call for build-interaction
Time: 171.74925 secs. calls: 145877 avg:   1177.4 usec/call for find-pubmed-id
Time: 6.2035060 secs. calls: 8853 avg:    700.7 usec/call for find-rna

Common biogrid/GO functions
Time: 105.45336 secs. calls: 10899 avg:   9675.5 usec/call for find-memberln
Time: 9.5412558 secs. calls: 9464 avg:   1008.2 usec/call for find-protein-form

GO functions
Time: 105.78463 secs. calls: 1 avg: 105784630.0 usec/call for gene-go-anno
Time: 0.6229931 secs. calls: 1 avg: 622993.1 usec/call for gene-go-write
Time: 65.820128 secs. calls: 10218 avg:   6441.6 usec/call for find-go-term
Time: 22.861835 secs. calls: 11047 avg:   2069.5 usec/call for add-go-info

The speedup in biogrid-interaction-annotation and find-protein-form is due to #147.

linas commented 4 years ago

Some results on find-mol, using the same test cse as above.

The original find-mol:

Time: 167.04053 secs. calls: 3769 avg:  44319.6 usec/call for find-mol
Time: 164.07672 secs. calls: 3769 avg:  43533.2 usec/call for find-mol
Time: 185.57233 secs. calls: 3769 avg:  49236.5 usec/call for find-mol

refactor find-mol to use srfi-1 map instead of GroundedSchema

Time: 111.73313 secs. calls: 3769 avg:  29645.3 usec/call for find-mol

Refactor above to use srfi-1 filter-map instead of GroundedPredicate

Time: 109.04895 secs. calls: 3769 avg:  28933.1 usec/call for find-mol
Time: 115.21986 secs. calls: 3769 avg:  30570.4 usec/call for find-mol

Refactor above to cache the search results:

Time: 97.165338 secs. calls: 3769 avg:  25780.1 usec/call for find-mol
Time: 101.62701 secs. calls: 3769 avg:  26963.9 usec/call for find-mol
Time: 100.25473 secs. calls: 3769 avg:  26599.8 usec/call for find-mol

The above code is in pull req #153

Calls to the pattern search:

Time: 1.6998823 secs. calls: 1992 avg:    853.4 usec/call for get-mol
Time: 1.4956957 secs. calls: 1992 avg:    750.9 usec/call for get-mol

Wow. That means that almost all of the time is spent in add-mol-info -- so that is the place to explore...

linas commented 4 years ago

Results for find-coding-gene. First the original form:

Time: 42.131111 secs. calls: 177359 avg:    237.5 usec/call for find-coding-gene
Time: 41.810285 secs. calls: 177359 avg:    235.7 usec/call for find-coding-gene

then the cached version:

Time: 3.9859701 secs. calls: 177359 avg:     22.5 usec/call for find-coding-gene
Time: 1.8742897 secs. calls: 8517 avg:    220.1 usec/call for do-find-coding-gene

The cache-miss ratio is 8517/177359 = 4.8% so that's a big deal. The above code is in pull req #154

linas commented 4 years ago

Here's the find-rna cache hit ratio. It also suggests that filterbytype is worth tuning:

Time: 176.14719 secs. calls: 184351 avg:    955.5 usec/call for find-rna
Time: 7.1127286 secs. calls: 8853 avg:    803.4 usec/call for do-get-rna

So the miss ratio is 8853 / 184351 = 4.8%

linas commented 4 years ago

Here's the results for go-info caching. First, the original code, find-go-term

Time: 79.225793 secs. calls: 10218 avg:   7753.6 usec/call for find-go-term
Time: 77.238001 secs. calls: 10218 avg:   7559.0 usec/call for find-go-term

and an important piece of find-go-term:

Time: 29.799717 secs. calls: 11047 avg:   2697.5 usec/call for add-go-info
Time: 28.259893 secs. calls: 11047 avg:   2558.2 usec/call for add-go-info

After caching:

Time: 50.957783 secs. calls: 10218 avg:   4987.1 usec/call for find-go-term
Time: 2.4260579 secs. calls: 11047 avg:    219.6 usec/call for add-go-info
Time: 1.9518182 secs. calls: 11047 avg:    176.7 usec/call for go-info
Time: 1.8019651 secs. calls: 3449 avg:    522.5 usec/call for do-go-info

So grand-total find-go-term is 78/51 = 1.5x faster because add-go-info is 29/2.5 = 12x faster.

The cache-miss ratio for go-info is 3449/11047 = 31%

linas commented 4 years ago

Today's results, all patches applied, measured on an idle system. Total timke: 824 seconds

Pathway functions
Time: 272.05854 secs. calls: 1 avg: 272058536.0 usec/call for gene-path-anno
Time: 95.236816 secs. calls: 1 avg: 95236816.3 usec/call for gene-path-write
Time: 93.509980 secs. calls: 681 avg: 137312.7 usec/call for smpdb
Time: 173.98547 secs. calls: 681 avg: 255485.3 usec/call for reactome
Time: 229.05144 secs. calls: 3769 avg:  60772.5 usec/call for find-pathway-genes
Time: 136.42400 secs. calls: 183670 avg:    742.8 usec/call for add-pathway-genes
Time: 35.072011 secs. calls: 3769 avg:   9305.4 usec/call for find-mol
Time: 1.0683796 secs. calls: 1992 avg:    536.3 usec/call for get-mol
Time: 2.4463706 secs. calls: 1362 avg:   1796.2 usec/call for find-pathway-member
Time: 0.4926354 secs. calls: 2197 avg:    224.2 usec/call for pathway-hierarchy
Time: 4.1501826 secs. calls: 1362 avg:   3047.1 usec/call for find-protein
Time: 24.665102 secs. calls: 153412 avg:    160.8 usec/call for add-loc
Zero calls to pathway-gene-interactors
Zero calls to generate-interactors

Common biogrid/pathway functions
Time: 1.6548900 secs. calls: 1549 avg:   1068.4 usec/call for find-name
Time: 15.904738 secs. calls: 18187 avg:    874.5 usec/call for locate-node
Time: 113.06343 secs. calls: 184351 avg:    613.3 usec/call for find-rna
Time: 4.1273338 secs. calls: 8853 avg:    466.2 usec/call for do-get-rna
Time: 2.2090188 secs. calls: 177359 avg:     12.5 usec/call for find-coding-gene
Time: 1.0117541 secs. calls: 8517 avg:    118.8 usec/call for do-find-coding-gene
Time: 43.892598 secs. calls: 3234083 avg:     13.6 usec/call for node-info
Time: 23.923776 secs. calls: 140556 avg:    170.2 usec/call for do-get-node-info
Zero calls to find-parent

Biogrid functions
Time: 386.88683 secs. calls: 1 avg: 386886834.6 usec/call for biogrid-anno
Time: 12.858690 secs. calls: 1 avg: 12858690.3 usec/call for biogrid-write
Time: 39.008608 secs. calls: 681 avg:  57281.4 usec/call for match-gene-interactors
Time: 347.28825 secs. calls: 681 avg: 509968.1 usec/call for find-output-interactors
Time: 207.85364 secs. calls: 540000 avg:    384.9 usec/call for generate-result
Time: 42.764158 secs. calls: 1080000 avg:     39.6 usec/call for build-interaction
Time: 122.86251 secs. calls: 540000 avg:    227.5 usec/call for find-pubmed-id
Time: 112.85280 secs. calls: 145877 avg:    773.6 usec/call for do-find-pubmed-id

Common biogrid/GO functions
Time: 52.962694 secs. calls: 10899 avg:   4859.4 usec/call for find-memberln
Time: 6.8227289 secs. calls: 9464 avg:    720.9 usec/call for find-protein-form

GO functions
Time: 52.689303 secs. calls: 1 avg: 52689303.4 usec/call for gene-go-anno
Time: 0.5131226 secs. calls: 1 avg: 513122.6 usec/call for gene-go-write
Time: 29.331335 secs. calls: 10218 avg:   2870.6 usec/call for find-go-term
Time: 1.3569475 secs. calls: 11047 avg:    122.8 usec/call for add-go-info
Time: 1.1187662 secs. calls: 11047 avg:    101.3 usec/call for go-info
Time: 1.0191072 secs. calls: 3449 avg:    295.5 usec/call for do-go-info

Code examination suggests that find-pathway-genes and add-pathway-genes are juicy targets.

linas commented 4 years ago

Total elapsed time, at various points in master.

Legened:

private branch w/instrumentation and w/all pending pulls #156 #157 total secs: 824, 826 (idle, huge)

private branch w/all pending pulls #156 #157 total secs: 799, 781 (idle, huge) and single-threaded - in more detail:

Elapsed: 839.95 secs. Rate: 18.36 gc/min %cpu-GC: 49.94%  %cpu-use: 137.1%
Elapsed: 803.77 secs. Rate: 20.68 gc/min %cpu-GC: 53.45%  %cpu-use: 139.4%
Elapsed: 799.76 secs. Rate: 20.71 gc/min %cpu-GC: 52.68%  %cpu-use: 139.3%
Elapsed: 813.68 secs. Rate: 20.35 gc/min %cpu-GC: 52.35%  %cpu-use: 139.1%

50% of time in GC? Wow!

total secs: 883 (idle, small) Notice his is 10% slower than huge. total secs: 820, 811, 833, 842, 816, 810, 831 (idle, huge, with hash mutex) maybe 1% slower?

idle, huge, with mutex and par-map in main.scm (see #144 and especially https://github.com/MOZI-AI/annotation-scheme/issues/144#issuecomment-596162688 )

Elapsed: 530.59 secs. Rate: 27.82 gc/min %cpu-GC: 106.4%  %cpu-use: 264.0%
Elapsed: 539.27 secs. Rate: 28.48 gc/min %cpu-GC: 113.5%  %cpu-use: 269.6%
Elapsed: 530.66 secs. Rate: 26.46 gc/min %cpu-GC: 101.6%  %cpu-use: 260.1%
Elapsed: 534.02 secs. Rate: 29.44 gc/min %cpu-GC: 114.6%  %cpu-use: 269.2%

Note that performance is (800 / 535)=1.5x faster, but uses 2x more CPU, of which more than 100% is in GC !! Wow!

Single-threaded, busy system, hugepages:

Elapsed: 1385.2 secs. Rate: 12.56 gc/min %cpu-GC: 13.86%  %cpu-use: 108.1%
Elapsed: 1484.0 secs. Rate: 12.25 gc/min %cpu-GC: 17.48%  %cpu-use: 107.2%
Elapsed: 1461.2 secs. Rate: 11.87 gc/min %cpu-GC: 20.61%  %cpu-use: 109.9%

Single-threaded, busy-system, smallpages:

Elapsed: 1486.8 secs. Rate: 11.42 gc/min %cpu-GC: 16.69%  %cpu-use: 106.8%
Elapsed: 1471.9 secs. Rate: 12.27 gc/min %cpu-GC: 16.48%  %cpu-use: 107.0%
Elapsed: 1475.5 secs. Rate: 12.28 gc/min %cpu-GC: 18.04%  %cpu-use: 108.7%

so on a busy system pagesize doesn't matter: TLB contention wipes it all out, large or small. I do not understand why GC is faster/suppressed on busy systems!? compare: 1484.0 secs 107.2% 12.25% = 195 secs on loaded system, vs 803.77 secs 139.4% 53.45% = 599 seconds on idle system. WTF!?

Mar 7 2020 12bb9cebe556ae558513c54902535346c3b2e826 (find-coding-gene)

154 total secs: 916 (idle, small)

and 789, 819 secs (idle, huge)

Mar 7 2020 d6d02a8708bd132547be5c2ced8706a449e2bd01 (cache-go-info)

155

Mar 7 2020 d2a7520e3bb9eeae9bacfd07a3c809c811540199 (cache-mol)

153

Mar 7 2020 1009efe65314c9d0986cae1144763893db03c96b (degenerate)

149

Mar 6 2020 9ca662eaa831cbbe5db035ccba8548acc1e5bec4 (cache-find-rna)

151 total secs: 1718 (busy, small)

Mar 6 2020 680293ff90078a82331fa4b427c502076725deed (cache-protein-form)

147 total secs: 2255 (busy, small)

Mar 5 2020 fe74c17af7c1eb922011be12ed84b214373bb774 (defilter)

146 total: 4466 (idle, small)

Mar 5 2020 3f2fd54ecc25e7881c61cea26c50ab12a4f13246 (get-not-bind)

145

Mar 2 2020 518e0fb586006cc3456059a8cb5647e2d0377e3d (unlist-node)

137

29 Feb 2020 0f02e658bc80416bd713e2f1cd0c4d213ac05d66 (pubmed-cache)

136

28 Feb 2020 50da5d9d2dae90e40a6040ffa0f6610340047648 (interactors)

132 total: 4502 (idle, small)

28 Feb 2020 39f4edd0e55ec5dec5eec24b700721f7610a83de (generate)

133

25 Feb 2020 438f6218be2383b5e61e626dd5db145a2da0c87e (memoize-node-info)

129

25 Feb 2020 f5db6fecd8bf79d6f7280be0a16488012847f436 (check-path2)

127 killed after 9 hours spinning in guile GC.

25 Feb 2020 8f6eff0508cdf1ba15d309b18c69e8415c32d7dc (memoize-find-name)

130

25 Feb 2020 453f48c3e01505420a050e4738419a9d2900da28 (memoize-locate-node)

131

20 Feb 2020 f2127ae53fc680f3ae9054f85ecd662030a4b249 (mutation-be-gone)

122 Bombs with RAM overflow

linas commented 4 years ago

Replacing append by append!, map bu map! and append-map by append-map! in this branch: https://github.com/linas/annotation-scheme/tree/bang made no difference whatsoever om performance or GC overhead:

Elapsed: 798.81 secs. Rate: 20.73 gc/min %cpu-GC: 52.70%  %cpu-use: 139.0%

but perhaps using fold would provide the desired improvement?

linas commented 4 years ago

With today's master, and also #175 #174 and #161:

Elapsed: 795.90 secs. Rate: 20.43 gc/min %cpu-GC: 54.72%  %cpu-use: 142.0%