Open linas opened 4 years ago
Running the unit test from #144 I get this:
gc-stats
((gc-time-taken . 648711314091)
(heap-size . 7639584768)
(heap-free-size . 7435337728)
(heap-total-allocated . 82374530432)
(heap-allocated-since-gc . 4273168)
(protected-objects . 12) (gc-times . 8119))
(report)
Time: 5293.8106 secs. calls: 1 avg: 5293810625.4 usec/call for biogrid-anno
Time: 18.310015 secs. calls: 1 avg: 18310014.9 usec/call for biogrid-write
Time: 111.67324 secs. calls: 1 avg: 111673244.6 usec/call for gene-go-anno
Time: 0.8286856 secs. calls: 1 avg: 828685.6 usec/call for gene-go-write
Time: 1369.7452 secs. calls: 1 avg: 1369745214.7 usec/call for gene-path-anno
Time: 134.71432 secs. calls: 1 avg: 134714321.3 usec/call for gene-path-write
Time: 293.07109 secs. calls: 681 avg: 430354.0 usec/call for smpdb
Time: 1069.4209 secs. calls: 681 avg: 1570368.5 usec/call for reactome
Time: 1190.0549 secs. calls: 3769 avg: 315748.2 usec/call for find-pathway-genes
Time: 1016.5998 secs. calls: 183670 avg: 5534.9 usec/call for add-pathway-genes
Time: 4.5974837 secs. calls: 1362 avg: 3375.5 usec/call for find-pathway-member
Zero calls to pathway-gene-interactors
Zero calls to generate-interactors
Time: 0.6488260 secs. calls: 2197 avg: 295.3 usec/call for pathway-hierarchy
Time: 5.9196219 secs. calls: 1362 avg: 4346.3 usec/call for find-protein
Time: 167.04053 secs. calls: 3769 avg: 44319.6 usec/call for find-mol
Time: 67.134105 secs. calls: 777 avg: 86401.7 usec/call for find-go-term
Time: 393.59597 secs. calls: 681 avg: 577967.6 usec/call for match-gene-interactors
Time: 4899.1494 secs. calls: 681 avg: 7194052.0 usec/call for find-output-interactors
Time: 4855.8332 secs. calls: 577768 avg: 8404.5 usec/call for generate-result
Time: 73.681270 secs. calls: 1155536 avg: 63.8 usec/call for build-interaction
Time: 4515.5293 secs. calls: 1165658 avg: 3873.8 usec/call for find-protein-form
Time: 2.5982513 secs. calls: 1549 avg: 1677.4 usec/call for find-name
Time: 192.79943 secs. calls: 145876 avg: 1321.7 usec/call for find-pubmed-id
Time: 109.20307 secs. calls: 1458 avg: 74899.2 usec/call for find-memberln
Time: 27.601605 secs. calls: 11047 avg: 2498.6 usec/call for add-go-info
Zero calls to find-parent
Time: 26.258079 secs. calls: 18190 avg: 1443.5 usec/call for locate-node
Time: 59.117296 secs. calls: 153412 avg: 385.3 usec/call for add-loc
Manually breaking down the call-graph:
gene-pathway-annotation - 1370 secs
-- rectome - 1070 secs
-- find-pathway-member 4 secs
-- find-mol - 167 secs
-- find-pathway-genes - 1190 secs
-- pattern
-- add-pathway-genes - 1016 secs
biogrid-interaction-annotation - 5294 secs -- find-output-interactors - 4899 secs -- pattern -- generate-result - 4855 secs -- find-go-term - 67 secs -- find-protein-form - 4515 secs
Unit test from #144, after assorted fixes: 20.4 gb virt, 4.4 gb RSS, 43 minutes elapsed (2620 seconds) which includes time to load dataset.... which was 598 seconds. gc-stats is: (heap-size . 7658577920) (heap-free-size . 7396917248)
so 250MBytes of RAM used by scheme. The report:
Pathway functions
Time: 521.29256 secs. calls: 1 avg: 521292555.8 usec/call for gene-path-anno
Time: 115.86134 secs. calls: 1 avg: 115861342.7 usec/call for gene-path-write
Time: 172.32179 secs. calls: 681 avg: 253042.3 usec/call for smpdb
Time: 342.50595 secs. calls: 681 avg: 502945.6 usec/call for reactome
Time: 345.16271 secs. calls: 3769 avg: 91579.4 usec/call for find-pathway-genes
Time: 181.36376 secs. calls: 183670 avg: 987.4 usec/call for add-pathway-genes
Time: 164.07672 secs. calls: 3769 avg: 43533.2 usec/call for find-mol
Time: 4.8064842 secs. calls: 1362 avg: 3529.0 usec/call for find-pathway-member
Time: 0.6353414 secs. calls: 2197 avg: 289.2 usec/call for pathway-hierarchy
Time: 5.6786770 secs. calls: 1362 avg: 4169.4 usec/call for find-protein
Time: 54.434262 secs. calls: 153412 avg: 354.8 usec/call for add-loc
Zero calls to pathway-gene-interactors
Zero calls to generate-interactors
Common biogrid/pathway functions
Time: 2.0903343 secs. calls: 1549 avg: 1349.5 usec/call for find-name
Time: 23.860827 secs. calls: 18187 avg: 1312.0 usec/call for locate-node
Zero calls to find-parent
Biogrid functions
Time: 544.60710 secs. calls: 1 avg: 544607103.2 usec/call for biogrid-anno
Time: 16.548887 secs. calls: 1 avg: 16548887.1 usec/call for biogrid-write
Time: 55.154352 secs. calls: 681 avg: 80990.2 usec/call for match-gene-interactors
Time: 488.67459 secs. calls: 681 avg: 717583.8 usec/call for find-output-interactors
Time: 282.23120 secs. calls: 540000 avg: 522.7 usec/call for generate-result
Time: 52.450733 secs. calls: 1080000 avg: 48.6 usec/call for build-interaction
Time: 171.74925 secs. calls: 145877 avg: 1177.4 usec/call for find-pubmed-id
Time: 6.2035060 secs. calls: 8853 avg: 700.7 usec/call for find-rna
Common biogrid/GO functions
Time: 105.45336 secs. calls: 10899 avg: 9675.5 usec/call for find-memberln
Time: 9.5412558 secs. calls: 9464 avg: 1008.2 usec/call for find-protein-form
GO functions
Time: 105.78463 secs. calls: 1 avg: 105784630.0 usec/call for gene-go-anno
Time: 0.6229931 secs. calls: 1 avg: 622993.1 usec/call for gene-go-write
Time: 65.820128 secs. calls: 10218 avg: 6441.6 usec/call for find-go-term
Time: 22.861835 secs. calls: 11047 avg: 2069.5 usec/call for add-go-info
The speedup in biogrid-interaction-annotation
and find-protein-form
is due to #147.
Some results on find-mol
, using the same test cse as above.
The original find-mol:
Time: 167.04053 secs. calls: 3769 avg: 44319.6 usec/call for find-mol
Time: 164.07672 secs. calls: 3769 avg: 43533.2 usec/call for find-mol
Time: 185.57233 secs. calls: 3769 avg: 49236.5 usec/call for find-mol
refactor find-mol to use srfi-1 map instead of GroundedSchema
Time: 111.73313 secs. calls: 3769 avg: 29645.3 usec/call for find-mol
Refactor above to use srfi-1 filter-map
instead of GroundedPredicate
Time: 109.04895 secs. calls: 3769 avg: 28933.1 usec/call for find-mol
Time: 115.21986 secs. calls: 3769 avg: 30570.4 usec/call for find-mol
Refactor above to cache the search results:
Time: 97.165338 secs. calls: 3769 avg: 25780.1 usec/call for find-mol
Time: 101.62701 secs. calls: 3769 avg: 26963.9 usec/call for find-mol
Time: 100.25473 secs. calls: 3769 avg: 26599.8 usec/call for find-mol
The above code is in pull req #153
Calls to the pattern search:
Time: 1.6998823 secs. calls: 1992 avg: 853.4 usec/call for get-mol
Time: 1.4956957 secs. calls: 1992 avg: 750.9 usec/call for get-mol
Wow. That means that almost all of the time is spent in add-mol-info
-- so that is the place to explore...
Results for find-coding-gene
. First the original form:
Time: 42.131111 secs. calls: 177359 avg: 237.5 usec/call for find-coding-gene
Time: 41.810285 secs. calls: 177359 avg: 235.7 usec/call for find-coding-gene
then the cached version:
Time: 3.9859701 secs. calls: 177359 avg: 22.5 usec/call for find-coding-gene
Time: 1.8742897 secs. calls: 8517 avg: 220.1 usec/call for do-find-coding-gene
The cache-miss ratio is 8517/177359 = 4.8% so that's a big deal. The above code is in pull req #154
Here's the find-rna
cache hit ratio. It also suggests that filterbytype
is worth tuning:
Time: 176.14719 secs. calls: 184351 avg: 955.5 usec/call for find-rna
Time: 7.1127286 secs. calls: 8853 avg: 803.4 usec/call for do-get-rna
So the miss ratio is 8853 / 184351 = 4.8%
Here's the results for go-info
caching. First, the original code, find-go-term
Time: 79.225793 secs. calls: 10218 avg: 7753.6 usec/call for find-go-term
Time: 77.238001 secs. calls: 10218 avg: 7559.0 usec/call for find-go-term
and an important piece of find-go-term
:
Time: 29.799717 secs. calls: 11047 avg: 2697.5 usec/call for add-go-info
Time: 28.259893 secs. calls: 11047 avg: 2558.2 usec/call for add-go-info
After caching:
Time: 50.957783 secs. calls: 10218 avg: 4987.1 usec/call for find-go-term
Time: 2.4260579 secs. calls: 11047 avg: 219.6 usec/call for add-go-info
Time: 1.9518182 secs. calls: 11047 avg: 176.7 usec/call for go-info
Time: 1.8019651 secs. calls: 3449 avg: 522.5 usec/call for do-go-info
So grand-total find-go-term
is 78/51 = 1.5x faster because add-go-info
is 29/2.5 = 12x faster.
The cache-miss ratio for go-info
is 3449/11047 = 31%
Today's results, all patches applied, measured on an idle system. Total timke: 824 seconds
Pathway functions
Time: 272.05854 secs. calls: 1 avg: 272058536.0 usec/call for gene-path-anno
Time: 95.236816 secs. calls: 1 avg: 95236816.3 usec/call for gene-path-write
Time: 93.509980 secs. calls: 681 avg: 137312.7 usec/call for smpdb
Time: 173.98547 secs. calls: 681 avg: 255485.3 usec/call for reactome
Time: 229.05144 secs. calls: 3769 avg: 60772.5 usec/call for find-pathway-genes
Time: 136.42400 secs. calls: 183670 avg: 742.8 usec/call for add-pathway-genes
Time: 35.072011 secs. calls: 3769 avg: 9305.4 usec/call for find-mol
Time: 1.0683796 secs. calls: 1992 avg: 536.3 usec/call for get-mol
Time: 2.4463706 secs. calls: 1362 avg: 1796.2 usec/call for find-pathway-member
Time: 0.4926354 secs. calls: 2197 avg: 224.2 usec/call for pathway-hierarchy
Time: 4.1501826 secs. calls: 1362 avg: 3047.1 usec/call for find-protein
Time: 24.665102 secs. calls: 153412 avg: 160.8 usec/call for add-loc
Zero calls to pathway-gene-interactors
Zero calls to generate-interactors
Common biogrid/pathway functions
Time: 1.6548900 secs. calls: 1549 avg: 1068.4 usec/call for find-name
Time: 15.904738 secs. calls: 18187 avg: 874.5 usec/call for locate-node
Time: 113.06343 secs. calls: 184351 avg: 613.3 usec/call for find-rna
Time: 4.1273338 secs. calls: 8853 avg: 466.2 usec/call for do-get-rna
Time: 2.2090188 secs. calls: 177359 avg: 12.5 usec/call for find-coding-gene
Time: 1.0117541 secs. calls: 8517 avg: 118.8 usec/call for do-find-coding-gene
Time: 43.892598 secs. calls: 3234083 avg: 13.6 usec/call for node-info
Time: 23.923776 secs. calls: 140556 avg: 170.2 usec/call for do-get-node-info
Zero calls to find-parent
Biogrid functions
Time: 386.88683 secs. calls: 1 avg: 386886834.6 usec/call for biogrid-anno
Time: 12.858690 secs. calls: 1 avg: 12858690.3 usec/call for biogrid-write
Time: 39.008608 secs. calls: 681 avg: 57281.4 usec/call for match-gene-interactors
Time: 347.28825 secs. calls: 681 avg: 509968.1 usec/call for find-output-interactors
Time: 207.85364 secs. calls: 540000 avg: 384.9 usec/call for generate-result
Time: 42.764158 secs. calls: 1080000 avg: 39.6 usec/call for build-interaction
Time: 122.86251 secs. calls: 540000 avg: 227.5 usec/call for find-pubmed-id
Time: 112.85280 secs. calls: 145877 avg: 773.6 usec/call for do-find-pubmed-id
Common biogrid/GO functions
Time: 52.962694 secs. calls: 10899 avg: 4859.4 usec/call for find-memberln
Time: 6.8227289 secs. calls: 9464 avg: 720.9 usec/call for find-protein-form
GO functions
Time: 52.689303 secs. calls: 1 avg: 52689303.4 usec/call for gene-go-anno
Time: 0.5131226 secs. calls: 1 avg: 513122.6 usec/call for gene-go-write
Time: 29.331335 secs. calls: 10218 avg: 2870.6 usec/call for find-go-term
Time: 1.3569475 secs. calls: 11047 avg: 122.8 usec/call for add-go-info
Time: 1.1187662 secs. calls: 11047 avg: 101.3 usec/call for go-info
Time: 1.0191072 secs. calls: 3449 avg: 295.5 usec/call for do-go-info
Code examination suggests that find-pathway-genes
and add-pathway-genes
are juicy targets.
Total elapsed time, at various points in master.
Legened:
private branch w/instrumentation and w/all pending pulls #156 #157 total secs: 824, 826 (idle, huge)
private branch w/all pending pulls #156 #157 total secs: 799, 781 (idle, huge) and single-threaded - in more detail:
Elapsed: 839.95 secs. Rate: 18.36 gc/min %cpu-GC: 49.94% %cpu-use: 137.1%
Elapsed: 803.77 secs. Rate: 20.68 gc/min %cpu-GC: 53.45% %cpu-use: 139.4%
Elapsed: 799.76 secs. Rate: 20.71 gc/min %cpu-GC: 52.68% %cpu-use: 139.3%
Elapsed: 813.68 secs. Rate: 20.35 gc/min %cpu-GC: 52.35% %cpu-use: 139.1%
50% of time in GC? Wow!
total secs: 883 (idle, small) Notice his is 10% slower than huge. total secs: 820, 811, 833, 842, 816, 810, 831 (idle, huge, with hash mutex) maybe 1% slower?
idle, huge, with mutex and par-map
in main.scm
(see #144 and especially https://github.com/MOZI-AI/annotation-scheme/issues/144#issuecomment-596162688 )
Elapsed: 530.59 secs. Rate: 27.82 gc/min %cpu-GC: 106.4% %cpu-use: 264.0%
Elapsed: 539.27 secs. Rate: 28.48 gc/min %cpu-GC: 113.5% %cpu-use: 269.6%
Elapsed: 530.66 secs. Rate: 26.46 gc/min %cpu-GC: 101.6% %cpu-use: 260.1%
Elapsed: 534.02 secs. Rate: 29.44 gc/min %cpu-GC: 114.6% %cpu-use: 269.2%
Note that performance is (800 / 535)=1.5x faster, but uses 2x more CPU, of which more than 100% is in GC !! Wow!
Single-threaded, busy system, hugepages:
Elapsed: 1385.2 secs. Rate: 12.56 gc/min %cpu-GC: 13.86% %cpu-use: 108.1%
Elapsed: 1484.0 secs. Rate: 12.25 gc/min %cpu-GC: 17.48% %cpu-use: 107.2%
Elapsed: 1461.2 secs. Rate: 11.87 gc/min %cpu-GC: 20.61% %cpu-use: 109.9%
Single-threaded, busy-system, smallpages:
Elapsed: 1486.8 secs. Rate: 11.42 gc/min %cpu-GC: 16.69% %cpu-use: 106.8%
Elapsed: 1471.9 secs. Rate: 12.27 gc/min %cpu-GC: 16.48% %cpu-use: 107.0%
Elapsed: 1475.5 secs. Rate: 12.28 gc/min %cpu-GC: 18.04% %cpu-use: 108.7%
so on a busy system pagesize doesn't matter: TLB contention wipes it all out, large or small. I do not understand why GC is faster/suppressed on busy systems!? compare: 1484.0 secs 107.2% 12.25% = 195 secs on loaded system, vs 803.77 secs 139.4% 53.45% = 599 seconds on idle system. WTF!?
Mar 7 2020 12bb9cebe556ae558513c54902535346c3b2e826 (find-coding-gene)
and 789, 819 secs (idle, huge)
Mar 7 2020 d6d02a8708bd132547be5c2ced8706a449e2bd01 (cache-go-info)
Mar 7 2020 d2a7520e3bb9eeae9bacfd07a3c809c811540199 (cache-mol)
Mar 7 2020 1009efe65314c9d0986cae1144763893db03c96b (degenerate)
Mar 6 2020 9ca662eaa831cbbe5db035ccba8548acc1e5bec4 (cache-find-rna)
Mar 6 2020 680293ff90078a82331fa4b427c502076725deed (cache-protein-form)
Mar 5 2020 fe74c17af7c1eb922011be12ed84b214373bb774 (defilter)
Mar 5 2020 3f2fd54ecc25e7881c61cea26c50ab12a4f13246 (get-not-bind)
Mar 2 2020 518e0fb586006cc3456059a8cb5647e2d0377e3d (unlist-node)
29 Feb 2020 0f02e658bc80416bd713e2f1cd0c4d213ac05d66 (pubmed-cache)
28 Feb 2020 50da5d9d2dae90e40a6040ffa0f6610340047648 (interactors)
28 Feb 2020 39f4edd0e55ec5dec5eec24b700721f7610a83de (generate)
25 Feb 2020 438f6218be2383b5e61e626dd5db145a2da0c87e (memoize-node-info)
25 Feb 2020 f5db6fecd8bf79d6f7280be0a16488012847f436 (check-path2)
25 Feb 2020 8f6eff0508cdf1ba15d309b18c69e8415c32d7dc (memoize-find-name)
25 Feb 2020 453f48c3e01505420a050e4738419a9d2900da28 (memoize-locate-node)
20 Feb 2020 f2127ae53fc680f3ae9054f85ecd662030a4b249 (mutation-be-gone)
Replacing append
by append!
, map
bu map!
and append-map
by append-map!
in this branch: https://github.com/linas/annotation-scheme/tree/bang made no difference whatsoever om performance or GC overhead:
Elapsed: 798.81 secs. Rate: 20.73 gc/min %cpu-GC: 52.70% %cpu-use: 139.0%
but perhaps using fold
would provide the desired improvement?
With today's master, and also #175 #174 and #161:
Elapsed: 795.90 secs. Rate: 20.43 gc/min %cpu-GC: 54.72% %cpu-use: 142.0%
This is not an issue report; rather, its a performance diary, tracking performance from where #98 left off.
Instrumented code to be measured as follows:
Then, start guile, ideally w/ hugetlb enabled:
Then start cogserver:
Then load your dataset, and run whatever job you wish to profile. Meanwhile, in another terminal:
This will print out the current profile of the currently-running code.
Alternately, if you don't have the cogserver, then just wait for your job to finish, and then run
(report)
at the guile REPL prompt.(Note to self: for me, do this:
$cd work/src; (load "annotate-all.scm")
)