Closed foxik closed 5 years ago
this turned out to be a more convoluted space than we had first anticipated! in general, MCES tends to be at least an order of magnitude slower than SMATCH (or, more precisely, its random-restart hill-climbing search: RRHC), at least when using the rather generous default limit of half a million search steps. MCES by itself should be deterministic, yes, but it does not always find an exact solution within a given (reasonable) limit.
in general, there is substantial variation in how different parameterizations of the scorers effect results, both in terms of scores and running time.
we recently activated MCES initialization with RRHC-based node correspondences, to make sure that MRP scores will always be at least the same as or higher than SMATCH. this adds potential for non-determinism, which we hope to counter-act with the generous limit of 50 for the RRHC part. this makes MCES substantially slower for two reasons: (a) the added time for the RRHC search and (b) reduced compatiblity between the initial correspondence and the scheduling heuristics in the MCES search.
we already feel a tad bad about tweaking the MRP metric this close to the evaluation period, hence will now be increasingly cautious about making further revisions (until after the official scores have been computed, that is :-). our reasoning in recent decision making has been that reliability (accuracy, determinism, and such) is more important than efficiency. in system development, i can of course see that waiting many minutes for an updated score can be inconvenient! the framework-specific metrics all tend to run a lot faster than the MRP scorer, and (as you are starting to explore) one can of course reduce the --limit
for the MCES search and disable the RRHC-based initialization, to reduce running time. with the possible exception of the UCCA graphs (where we are still gathering experimental data ourselves), i currently have no reason to believe that such optimizations much a substantial difference.
I definitely understand the concern of changing the scorer. Just few remarks (without any call to action :-)
--limit
of MCES search does not influence time added for RRHC, so you cannot change it easily on a command-line (or when using mtool API); but of course we can tweak that directly in the code ourselves;mces
score will take ~3.5 hours to compute on development eds
data, which is beyond "inconvenient". But as you say, we can use a surrogate metric.thanks again, milan! i am closing this issue now, as we have consolidated default values for the two metrics now and provided more control from the command line. while some of the code optimizations merged into the ‘dev’ branch earlier today can result in speed-ups of up to a factor of two, at this point i suspect our best short-term shot at dramatically increasing experimental turn-around will be through #16 (trivial parallelization of the scorer). i hope to still look into that before next week!
Consider the 5th sentence from eds sample:
Running in current head (07b10499f28a at the time of writing) the following command
takes 9 seconds, while turning off SMATCH as hill-climbing initialization by adding
if False
on line https://github.com/cfmrp/mtool/blob/07b10499f28ae9d67da6a811019bfcc5cd3ba978/score/mces.py#L326 allows the above command to finish in ~0.8s.Running
finishes in ~1s.
The problem is that
smatch
call at https://github.com/cfmrp/mtool/blob/07b10499f28ae9d67da6a811019bfcc5cd3ba978/score/mces.py#L328 takes a long time -- compared tosmatch
metric, an explicit limit of50
is provided, whilesmatch
uses a default limit of0
, which is5
, see https://github.com/cfmrp/mtool/blob/07b10499f28ae9d67da6a811019bfcc5cd3ba978/score/smatch.py#L79Currently running
mces
evaluation oneds
sample (89 sentences) takes ~5 minutes, while not using SMATCH initialization takes ~20s; using SMATCH initialization with limit 5 takes ~45s; using SMATCH initialization with limit 2 takes ~28s. So without any understanding of the internals of the metrics, the SMATCH initialization does not help in running faster (but it may of course help obtaining better results; but I thinkmces
should give deterministic results).