Investigate architectural refactoring of domain area: analysis

rummelsworth commented 4 years ago

[ ] Design & implement new, POCOs-only (no database) simulation module draft using current simulation algorithm.
[ ] Extract integration tests for this new module from current real input/output data used/produced by the current simulation module.
- Try to scale these input/output sets down to various small-to-medium sizes. There should be some small, "sanity-check" tests, but there should also be ones large enough to be representative of "real" runs. We should consider going as large as a low-end workstation's memory might permit, e.g. 8 GB RAM usage.

rummelsworth commented 4 years ago

I found a FOSS (MIT) library that checks almost all the boxes for replacing CalculateEvaluate:

https://github.com/sklose/NCalc2

[x] Supports numeric, string, date, and bool types out-of-the-box.
[x] Restricted grammar via ANTLR-generated parser. (Lighter-weight than full C# compilation. Prevents arbitrary C# injection, e.g. (() => { System.IO.Directory.Delete("C:\\", true); return 0; })() as a calculation equation.)
[x] Emits dynamic IL. (No assembly generation & loading overhead. Co-performant with statically compiled IL.)
[x] For "plain" evaluation (no IL generation), supports both custom functions and dynamic parameters.
[x] For generated IL, supports custom functions or dynamic parameters (post-compilation) ---
- [ ] --- but not both at the same time, without some effort in contributing a (seemingly small) modification back to the project.

rummelsworth commented 4 years ago

A small update here, especially re the previous comment: NCalc was ultimately not appropriate for our use case. However, an ANTLR/LINQ-based rewrite was, and the benchmark so far shows a 400x speedup over the legacy CodeDom-based CalculateEvaluate --- about 120 microseconds per expression with the former versus about 50 milliseconds per expression with the latter. For example, extrapolating from this particular benchmark, this means a "cold start" analysis with 1 million equations could compile everything from scratch in 2 minutes instead of 14 hours. Also, just to be clear, all the above checkboxes are checked by this solution (type support, restricted grammar, compiled to IL, custom functions, dynamic parameters).

The primary goal of this rewrite was to clean up the CalculateEvaluate API for use within the refactored analysis module. However, in light of the unexpectedly large performance gain, Chad has asked me to spend a few hours investigating the level of effort associated with replacing the old CalculateEvaluate module with this new CalculateEvaluate module before the integration of the new analysis module.

rummelsworth commented 4 years ago

As of a couple weeks ago, the refactored analysis code is nominally complete. Per the update given at the last iteration review meeting, I spent the past couple weeks on integration testing, specifically writing a configurable test program to draw analysis inputs from an existing sample database and verify that the implementation runs to completion and any structural bugs are sussed out. This is ongoing.

After it's verified to (a) run to completion and (b) produce the expected output structure, there will need to be a review meeting (with Gregg, Chad, and possibly Jake), to answer about a dozen (so far) outstanding questions on how the analysis handles certain edge-case situations. The answers to these questions will need to be accounted for in the implementation, with subsequent re-verification.

After the answers are accounted for, the new output results will need to be verified against expected results from the legacy analysis implementation.

Per a message from @jakedw7 this morning, my work on this project is to fully stop immediately, to resume no earlier than July.

ARA-Trans / iAM

Investigate architectural refactoring of domain area: analysis #632