INET-Complexity / Core-ESL

Open-source, distributed, economic simulation libraries for Java and Scala
7 stars 6 forks source link

Dependency Injection Framework #6

Open davidrpugh opened 7 years ago

davidrpugh commented 7 years ago

@EconomicSL/core-devs I am opening this thread in order to capture our discussion on various dependency injection frameworks.

There are many possibilities so I think it is useful to think about what features we want/need for our use-cases in order to help constrain the discussion. For examples, I feel strongly that we need a framework that supports compile-time DI (possibly in addition to run-time DI).

rafabap commented 7 years ago

Hey @davidrpugh I totally agree that this is a very important part of basically any simulation.

I'm not sure I understand how sophisticated we could get about it...

At the moment in my case study, the way I assume I'm doing dependency injection is by having a parameters class where all the parameters are static and public fields. Whenever a parameter is needed anywhere in the code, the relevant class looks up the corresponding field of Parameters. When a new simulation is run, the relevant fields are changed to new values, and if the new simulation is run and initialised the same way, the classes will load the new parameters.

The downsides are: public static fields are in a sense global variables, which is bad... plus, whenever a parameter is used it is read from Parameters, so if there is an equation that is called many times and uses a parameter, the parameter is read plenty of times which is not good.

Is what I'm doing a form of dependency injection, and if so, do we want anything similar?

davidrpugh commented 7 years ago

@rafabap Injecting parameters is only part of the story. Dependency Injection (DI) is also about injecting the behavioral modules (or decision rules) at compile and/or run-time rather than hard-coding behavioral modules or decision rules into an agent itself. Using DI pattern forces a develop to impose a clean separation between non-behavioral agent logic and behavioral agent logic.

In your case, I would recommend moving the model parameters into a separate config file (written in JSON or YML) and then create an instance of the Parameters class by reading in the relevant parts of the config file. Benefit of this approach is that parameters are injected at run-time and the code will not need to be recompiled every-time a parameter changes.

Will need to think a bit about whether or not this solution would also solve your second point. Typically config files are hierarchical and can be composed with one another. I think it should be possible to do something like...

class MyDecisionRule(config: MyDecisionRuleConfig) {

  val importantParameter: Double = config.getDouble("path-to-important-parameter-value-in-config-file")

}

I think something like the above would eliminate the value of importantParameter being read over and over again but I am not a hundred percent confident.

KG-ethc commented 7 years ago

I only glanced at the source that CRISIS used for their Java simulations, it looks like they used Guice for dependency injection, and they also had some Spring components but not sure if they used Spring's DI framework.

Have you guys considered using Dagger2? It's the most current/advanced lightweight DI framework, managed by Google. In some cases you might want to use Guice, which is still an active project managed by Google.

I'm no expert, I would suggest detailing your use cases and asking Greg Kick or someone else at the Google team who would be able to suggest which solution to use in which cases. Other good people to ask would be Jesse Wilson at Square, he helped design Guice and Dagger1. Based on what little I know, consider using Dagger2 for most DI, and Guice only where you absolutely need it. If you can avoid using Guice completely, that would probably speed up development and make all your API users' lives much easier.

davidrpugh commented 7 years ago

@KG-ethc Thanks for the pointers! CRISIS did use Guice for DI. I am aware of other projects, such as those by @phelps-sg, have used Spring's DI framework. For DI in Scala I have played around a bit with MacWire.

For our use cases I think we want something light-weight where much of the DI is done at compile-time rather than at run-time.

phelps-sg commented 7 years ago

One advantage of run-time DI for simulation modeling is that it facilitates model-selection, calibration and robustness-checking by allowing the modeler to sweep a space of parameters and/or behavioral rules. For example, the same model can be easily instantiated (realized) very many times with different parameter settings and/or rules. Personally, I haven't tried compile-time DI, but on the surface it would seem overly restrictive, and any practical benefits are not immediately obvious to me.

davidrpugh commented 7 years ago

@phelps-sg You raise some goods points in favor of run-time DI. A practical benefit of compile-time DI is that models using compile-time DI would be logically validated by the compiler which will eliminate expensive run time errors. Attempting to inject an invalid dependency will generate a compile-time error instead of a run-time error.

Run-time errors can be very expensive. Imagine a scenario in which you were doing model selection, calibration, parameter sweeps and your runs com crashing down because of a run-time error generated by injecting an invalid dependency in the middle of some simulation run for some agent.

I do not know to what extent we remain agnostic about the DI framework? Can we allow users to pick their poison (so to speak)? Or should we be opinionated on this issue?

phelps-sg commented 7 years ago

I can see that this would be an issue for production consumer-oriented software, but I don't think it applies to simulation modelling.
Firstly, compile-time DI would make parameter sweeps impractical because for a large model the costs of re-compiling the model for each run would be prohibitive. Secondly, when using DI for parameter sweeps these kinds of errors usually become apparent very quickly, since they are usually type errors, and it is quite difficult to define a range of values containing incompatible types, either accidentally or not.

On 29/03/17 10:40, David R. Pugh wrote:

@phelps-sg https://github.com/phelps-sg You raise some goods points in favor of run-time DI. A practical benefit of compile-time DI is that models using compile-time DI would be logically validated by the compiler which will eliminate expensive run time errors. Attempting to inject an invalid dependency will generate a compile-time error instead of a run-time error.

Run-time errors can be very expensive. Imagine a scenario in which you were doing model selection, calibration, parameter sweeps and your runs com crashing down because of a run-time error generated by injecting an invalid dependency in the middle of some simulation run for some agent.

I do not know to what extent we remain agnostic about the DI framework? Can we allow users to pick their poison (so to speak)? Or should we be opinionated on this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EconomicSL/Core-ESL/issues/6#issuecomment-290037992, or mute the thread https://github.com/notifications/unsubscribe-auth/AEXCgSrrmAc4Vts0YqLxdiAhZFygLWsrks5rqib2gaJpZM4MpA2l.

davidrpugh commented 7 years ago

@phelps-sg Clearly we can rule out using anything that makes parameter sweeps impractical. Perhaps it is possible to have the best of both worlds? Compile-time DI for behavioral components or decision rules for which logical consistency and type safety are useful and run-time DI for parameter values (and others things which one would like to vary quickly).

I would hope that changing a few parameters would not require the entire model source to be recompiled? Surely only those components that had changed would need to be re-compiled?

bherd-rb commented 7 years ago

@davidrpugh I also feel we need different levels of flexibility, so I like the idea of mixing compile-time and runtime DI. Is there a framework for Scala/Java that supports such hybrid DI? I would be keen to have a look at it. I think you mentioned something last week ...

davidrpugh commented 7 years ago

@bherd-rb Everything I know about DI in Scala I know from having read this guide on the subject by @adamw. I have not come across Dagger before it being mentioned in the thread above.

KG-ethc commented 7 years ago

@phelps-sg Happy to see you here, I liked your JABM.

Before we decide on which kind of DI to use, it would be helpful to know what kind of DI we need in which scope.

I read the planning.pdf but it's a very high level overview of the project, do you have any design docs to review? Also we should take time to define different simulation scales and their intended use cases/users. If we define a small simulation to mean something that runs on a newish Intel Quadcore 32GB RAM 256GB SSD 1/2TB HD then the design will have a lot fewer constraints. On the other end, what is a small cloud implementation, and how do you plan on making that easy to install for researchers with smaller budgets?

Ignoring hardware reqs, another key consideration is how the simulations will interact with the Graph Database. I'm more familiar with OrientDB than Neo4j, so using Orient as an example, its node(vertex) types are considered classes that support inheritance: assuming that each simulation will be able to create its own classes/vertices/edges, do you also plan on allowing learning algorithms to create their own new types of fields(columns), or queries? If so, analysis must be done on what the database will support when and at what speed. I imagine that using run-time DI to manipulate a database could cause all kinds of very difficult to test errors.

If someone is running a large simulation they surely are using an IDE to generate their own logic. If you want to allow people to write algorithms in a GUI or config file that's a whole other beast (outside the scope of the core API?), and on the level of writing a new language based on Java like NetLogo. It can be done, but I strongly suggest deprioritizing it until you have more resources. Have you thought about linking up with MIT's Observatory of Economic Complexity to discuss GUI? Long term, to make this user friendly, you might want to consider using Electron (node.js, Chromium) for desktop GUIs to connect with cloud instances. Connect Electron with the D3 Plus (Also Javascript) visualization suite, and the results could be fantastic.

davidrpugh commented 7 years ago

@KG-ethc I like the enthusiasm! You raised quite a few issues some of which probably deserve their own discussion threads. I have opened a few. Perhaps you can parse out your thoughts on code generation for large simulations using IDEs and on GUI design into separate threads?