Significant performance problem when generating HTN/TPN with parallel statements

dcerys commented 7 years ago

Generating a HTN and TPN with a lengthy parallel statement takes a very long time. To make matters worse, the execution time seems to be proportional to the length of parallel statement. In the following test case, you can comment out some of the calls to (something-simple), and find that the execution time of the HTN/TPN generation is:

6 calls to (something-simple): 20 seconds
7 calls to (something-simple): 39 seconds
8 calls to (something-simple): 86 seconds
9 calls to (something-simple): 195 seconds

Note that changing the parallel to a sequence changes the generation time to be roughly independent of the number of statements.

Note that using perform-15-minutes-of-activities or perform-activities as the root task results in a non-slow HTN/TPN generation, so the problem seems to be the combination of a parallel called by one (or more) sequence's.

;;;
;;; To generate the HTN and TPN from this:
;;;     pamela -v -i test3.pamela -o test3 -f edn -t "(test.main)" htn
;;;

(defpclass test []
  :methods [(defpmethod main []
              (sequence
               (perform-activities)
               (delay :bounds [10 10])
               (perform-activities)))

            (defpmethod perform-activities []
              (sequence
               (perform-15-minutes-of-activities)
               (perform-15-minutes-of-activities)
               (perform-15-minutes-of-activities)
               (perform-15-minutes-of-activities)))

            (defpmethod perform-15-minutes-of-activities []
              (parallel
               (something-simple)
               (something-simple)
               (something-simple)
               (something-simple)
               (something-simple)
               (something-simple) ;; ~20 sec
               (something-simple) ;; ~39 sec
               (something-simple) ;; ~86 sec
               (something-simple) ;; ~195 sec
               ))

            (defpmethod something-simple [])

            ])

dcerys commented 7 years ago

If debug logging is enabled, I see a significant number of "CHECK" statements, which is generated by tpn/remove-superfluous.

tmarble commented 7 years ago

Work-in-progress update https://github.com/tmarble/pamela/commit/5bdd2b398d35798de4bade27c41b01e7348b48d4

Can confirm that there is no need to comment out the dbg-println statements: the macro indeed does not eval the form when the selected debug threshold is not set.''

As simple test can be run with

(reset-gensym-generator)
(time (pamela "-v" "-v" "-l" "trace" "-i" "test/pamela/issue-115.pamela" "-o" "issue-115" "-f" "edn" "-t" "(test.main)" "htn"))
...
"Elapsed time: 6152.743897 msecs"

By adding criterium to the :test scope it's possible to get much more statistically valid information:

pamela.cli> (require 'criterium.core)
pamela.cli> pamela.utils/*dbg-println-level*
1
pamela.cli> (reset-gensym-generator)
0
pamela.cli> (criterium.core/with-progress-reporting (criterium.core/bench (pamela "-v" "-v" "-l" "trace" "-i" "test/pamela/issue-115.pamela" "-o" "issue-115" "-f" "edn" "-t" "(test.main)" "htn") :samples 5))
...
Evaluation count : 5 in 5 samples of 1 calls.
             Execution time mean : 1.012452 sec
    Execution time std-deviation : 10.573710 ms
   Execution time lower quantile : 999.635369 ms ( 2.5%)
   Execution time upper quantile : 1.023445 sec (97.5%)
                   Overhead used : 1.982120 ns

Using 3 parallel statements debug level 4 and below runs for about 1 second. Level 5 runs for 3 seconds. Level 6 runs for about 22 seconds. Thus using the default debug level 3 should not incur excess overhead.

tmarble commented 7 years ago

After removing the exponential complexity the performance results are evident:

trace 6, parallel 2
"Elapsed time: 3326.549989 msecs"
"Elapsed time: 3743.666514 msecs"
"Elapsed time: 3289.047951 msecs"

trace 6, parallel 3
"Elapsed time: 3315.804464 msecs"
"Elapsed time: 3864.005608 msecs"
"Elapsed time: 3393.669642 msecs"

trace 6, parallel 4
"Elapsed time: 3684.990509 msecs"
"Elapsed time: 4090.807653 msecs"
"Elapsed time: 3546.242385 msecs"

trace 6, parallel 5
"Elapsed time: 3924.724666 msecs"
"Elapsed time: 3925.048613 msecs"
"Elapsed time: 3949.488314 msecs"

trace 6, parallel 6
"Elapsed time: 4023.541322 msecs"
"Elapsed time: 4127.849754 msecs"
"Elapsed time: 4133.518941 msecs"

trace 6, parallel 7
"Elapsed time: 4191.771061 msecs"
"Elapsed time: 4162.683773 msecs"
"Elapsed time: 4399.626638 msecs"

trace 6, parallel 8
"Elapsed time: 4035.176058 msecs"
"Elapsed time: 4557.13223 msecs"
"Elapsed time: 4159.997504 msecs"

trace 6, parallel 9
"Elapsed time: 4159.997504 msecs"
"Elapsed time: 5000.702443 msecs"
"Elapsed time: 4273.945647 msecs"

dollabs / pamela

Significant performance problem when generating HTN/TPN with parallel statements #115