MaximeBouton commented 6 years ago

This package contains a lot of useful tools but they are very different and poorly documented which makes it hard to maintain. It would be good to split the POMDPToolbox module into several smaller modules:

Policies : would contain useful policy types such as alpha-vector policy, epsilon-greedy ...
BeliefUpdater: would contain useful belief updaters and belief types such as discrete belief, previous observation ...
Simulators: simulation helpers such as history recorder and rollout

The existing POMDPToolbox would contain the remaining functions that cannot be obviously separated into their own module.

As a first step for transitioning into the new code structure we could add deprecation warnings on the existing functions exported by POMDPToolbox.

63

zsunberg commented 6 years ago

I have been thinking more about our decision to break this package up, and, for my sake, we need to rehash the reasons for it. First I will go over my understanding of the reasons for the splitting up, and then I will outline my current thoughts.

Reasons and Challenges

Original Reasons for Splitting Up

Hard to keep track of all of the tools
It has too many dependencies if you only want to use a small subset of the tools

Difficulties encountered

a) Delays in registering on METADATA b) Every package depends on policies and simulators for testing (Integration testing is really important when we have many tools that need to work together).

Potential future problems

c) It will be more confusing to users and more difficult to find tools if there are many packages

My Thoughts

I think reason (1) can be addressed by better documentation using Documenter.jl and some cleanup of the code. (2) is actually not that big of a problem in my opinion - the only important dependencies are ProgressMeter and DataFrames, and these do not seem too problematic.

If I had foreseen (a) and (b), I definitely would have recommended not to split up. I think we would have been done with the whole process a week ago; At present, we are less than half way through (we have not started on documentation).

Today I have been considering the idea that, given (b) and (c), it would actually be better to reverse course and go back to a single POMDPToolbox package with good documentation. I think a single big package is actually easier to maintain with a high level of quality in this case.

What are the thoughts of everyone else (especially @MaximeBouton since you will play the biggest role maintaining it)? If there are better reasons for splitting up, can we outline them here? Thanks for your patience with me.

rejuvyesh commented 6 years ago

Reasons to split up

General Julia philosophy is to have small manageable packages. Moreover see the split up of standard lib. Kitchen sink packages get hard to maintain overtime.
Integration testing should ideally be done in a separate package anyway. It's just us being lazy :stuck_out_tongue: . See VIsualRegressionTests.jl for example.
The METADATA thing is because of transitioning issues. I'm pretty sure it's gonna get automated soon as it is for other programming languages.

It will be more confusing to users and more difficult to find tools if there are many packages

We just need more documentation and examples. Having obviously named packages for obvious things is good.

zsunberg commented 6 years ago

Ok, thanks Jayesh. Where should this documentation reside? In each package? Then it will be harder to find things

rejuvyesh commented 6 years ago

POMDPs.jl should have all the integration documentation: examples and walkthroughs. Individual packages only need the API docs.

MaximeBouton commented 6 years ago

Regarding the difficulties encountered:

The delays in registration on METADATA should not be a problem in the long run, now that the packages are registered it should be easy for people to checkout the master branch if they need a feature that has not been released yet. The delay is around 24h in normal time (I think it is long due to the transition to 1.0 right now) which is still reasonable.
The integration testing is challenging. I do not have a good answer on the correct way to organize it.

Documentation: This is most likely the real problem and why it is confusing to find tools. Splitting up the packages will not make it better nor worse to my mind as long as there is some documentation. The minimum documentation that we should provide is:

An overview of the integration between the different packages with minimum examples in the POMDPs.jl documentation. Ideally, the individual docs of each packages should refer the user to this as well.
A detailed description of all the tools in each individual packages

zsunberg commented 6 years ago

@rejuvyesh I don't think VisualRegressionTests does what you're describing. It just has functions for testing plotting packages. The actual tests are still in Plots.jl.

Can you outline the reasons (or point to someone else's writing) that kitchen sink packages are harder to maintain?

I really don't know what the right decision is now.

zsunberg commented 6 years ago

If the julia leadership really does think smaller packages are better, why do people seem to be pushing back against it (e.g. https://discourse.julialang.org/t/registering-a-new-package-in-general/13900)

rejuvyesh commented 6 years ago

If the julia leadership really does think smaller packages are better, why do people seem to be pushing back against it (e.g. https://discourse.julialang.org/t/registering-a-new-package-in-general/13900)

No one seems to be pushing against it. Someone is stating their personal opinion also basing on the fact that we used to use Submodules which we didn't because hardly anyone seems to use submodules in their projects.

I'll put up a longer explanation for other things soon.

Don't despair :smile:

rejuvyesh commented 6 years ago

So the main question is, why modularity?

As a user, I understand the gripe. I don't like writing using Printf just because I used @sprinf somewhere once in the package either. That's why kitchen sink packages can be pretty good and easy to use. Just drop in the entire thing and things just work (or seem to at least).

Maybe starting off things are all good. It's a small function that you added that lets you do a thing. Later on you have an idea to tweak, may be add a feature here or there. But slowly things start to accumulate. Now you have a giant kitchen sink library full of a bunch of features. Because you were almost developing the package on your own (awesome developer that you are :blush: ), you didn't have to care about modularity. You could keep everything in your head. Not just that, some things were more efficient to write because you never had to define a proper API for the interaction of different components. All of it still makes sense to you because you wrote it and had reasons for the choices made.

However, there's a flip side to it. Packages need to be maintained. Open source software is a weird thing. It commingles the idea of a user and a contributor. Ideally we want our users to graduate into contributors and maintainer themselves. It's possible that a particular use case might require certain things to work in a different manner. Unfortunately when they look into the staggering code base they have no idea how to proceed and can just give up.

Breaking the codebase into modules therefore helps. It forces us to think about how these different modules are going to interact. The tests for individual components should not depend on other packages. It's our fault that they do right now, partly because everything was in the toolbox. We want the APIs to be flexible enough that if someone wants to swap out our simulators for their own, things should still work.

The only reason we are trying different packages is because Julia projects don't seem to prefer submodules which could have been okay too.

JuliaPOMDP / POMDPToolbox.jl

Split into separate packages #64

63

Reasons and Challenges

Original Reasons for Splitting Up

Difficulties encountered

Potential future problems

My Thoughts

Reasons to split up