delph-in / erg

English Resource Grammar
MIT License
17 stars 3 forks source link

Slowdown between ERG 1214 and 2018 #33

Open andrewbriand opened 2 years ago

andrewbriand commented 2 years ago

I'm currently investigating the slowdown I discussed with @danflick . My work so far has been focusing on the following sentence:

On one side of this power struggle stand the forces in ascendency on Wall Street -- the New Guard -- consisting of high-tech computer wizards at the major brokerage firms, their pension fund clients with immense pools of money, and the traders at the fast-growing Chicago futures exchanges.

With 16GB of RAM, with the 1214 version of the grammar, this sentence produces 5646 readings in 66s for a time-per-parse of 12ms. However, in the 2018 version of the grammar, it produces 26 readings in the same amount of time for 2.5s per parse.

Increasing the RAM in the 2018 version produces the same number of readings, but in a longer period of time, running up until the RAM limit is reached. With 32GB, this leads to 26 readings in 155s, and with 50GB, this leads to 26 readings in 248s.

A further observation is that, according to ace's debug output, the 2018 version produces less hypotheses when more RAM is available.

2018 with 16GB RAM:

NOTE: loading frozen grammar ERG (2018)
NOTE: 10439 types, 40320 lexemes, 362 rules, 67 orules, 108 instances, 49510 strings, 233 features
permanent RAM: 3k

NOTE: hit RAM limit while unpacking
NOTE: 26 readings, added 117069 / 104559 edges to chart (21974 fully instantiated, 2262 actives used, 30807 passives used)      RAM: 16384002k
NOTE: parsed 1 / 1 sentences, avg 16019459k, time 66.89725s
40371819 total hypotheses generated
763 total nodes reconstructed
NOTE: glb hash: 0 direct hits, 0 collisions, 10661 misses
NOTE: 2502149 subsumption tests; qc filters 90.0% leaving 249043, of which ss passes 39.4% = 98059 ; 2.5% = 6243 generalizable
NOTE: unify filters: 21480481 total, 11778277 rf (54.8%), 488531 qc (2.3% / 4.1%), 310401 success (1.4% / 63.5%), 0 bad orth (0.0% / 0.0%)
NOTE: 575073 / 238636 (241.0%) passive edges were connected to roots

2018 with 32GB RAM:

NOTE: loading frozen grammar ERG (2018)
NOTE: 10439 types, 40320 lexemes, 362 rules, 67 orules, 108 instances, 49510 strings, 233 features
permanent RAM: 3k

NOTE: hit RAM limit while unpacking
NOTE: 26 readings, added 117069 / 104559 edges to chart (21974 fully instantiated, 2262 actives used, 30807 passives used)      RAM: 32767999k
NOTE: parsed 1 / 1 sentences, avg 31836161k, time 129.16637s
85684421 total hypotheses generated
763 total nodes reconstructed
NOTE: glb hash: 0 direct hits, 0 collisions, 10664 misses
NOTE: 2502149 subsumption tests; qc filters 90.0% leaving 249043, of which ss passes 39.4% = 98059 ; 2.5% = 6243 generalizable
NOTE: unify filters: 21480481 total, 11778277 rf (54.8%), 488531 qc (2.3% / 4.1%), 310401 success (1.4% / 63.5%), 0 bad orth (0.0% / 0.0%)
NOTE: 575073 / 238636 (241.0%) passive edges were connected to roots
TIMERS (260 calls = ~ 35.6µs overhead):

Note that the number of edges created is the same. So, I believe this to be a problem with unpacking.

When unpacking, if I print the hypotheses being popped off the agenda and the edges begin decomposed, I find that the same number of edges are decomposed with both RAM limits, but that more than 6x the hypotheses are popped from the agenda with the 32GB RAM limit. In both cases, the last hypothesis popped is with the rule hdn-np_app-pr_c.

Right now I'm exploring simpler sentences with this apposition rule to see if I can determine what might be happening. Any ideas/comments would be greatly appreciated. Thanks!