[OUTDATED] Major python refactoring

jmhvandoorn commented 2 years ago

I want to propose a somewhat bigger refactoring to the python files regarding:

[x] Improve logic in file and function names
[x] Improve import consistency
[x] Prepare for adding/testing additional strategies
[x] Remove redundant benchmark files
[ ] Get rid of having the static solver in four different places.

Since it might be quite a change, I am very curious what you think about it.

jmhvandoorn commented 2 years ago

The commit 8958291, that moves the solver into a function that returns the result, raised some issues. Possibly related to #95

https://github.com/N-Wouda/Euro-NeurIPS-2022/blob/386f07e1859988c1d52f20a5360152267e4f8420/benchmark.py#L42-L45

"solve_dynamic.py", line 29, in solve_epoch
    routes = [route for route in best.get_routes() if route]
RuntimeError: Could not allocate list object!

Somehow it seems that we cannot always access this value anymore after the function containing the solver gets out of scope.

jmhvandoorn commented 2 years ago

I will continue on this refactoring later today or tomorrow. Made the pull request mainly because I hoped getting help on why this code snippet does not work:

import tools

hgspy = tools.get_hgspy_module()

def solve_static(
    instance,
    *,
    seed=1,
    max_runtime=None,
    max_iterations=None,
    initial_solutions=(),
    excl_ops=(),
    solver_config=None,
    **kwargs
):
    config = hgspy.Config(
        seed=seed, **solver_config if solver_config is not None else {}
    )
    params = hgspy.Params(config, **tools.inst_to_vars(instance))

    rng = hgspy.XorShift128(seed=seed)
    pop = hgspy.Population(params, rng)

    for sol in initial_solutions:
        pop.add_individual(hgspy.Individual(params, sol))

    ls = hgspy.LocalSearch(params, rng)

    node_ops = [
        hgspy.operators.Exchange10(params),
        hgspy.operators.Exchange11(params),
        hgspy.operators.Exchange20(params),
        hgspy.operators.MoveTwoClientsReversed(params),
        hgspy.operators.Exchange21(params),
        hgspy.operators.Exchange22(params),
        hgspy.operators.TwoOpt(params),
    ]

    for op in node_ops:
        if not any(isinstance(op, excl) for excl in excl_ops):
            ls.add_node_operator(op)

    route_ops = [
        hgspy.operators.RelocateStar(params),
        hgspy.operators.SwapStar(params),
    ]

    for op in route_ops:
        if not any(isinstance(op, excl) for excl in excl_ops):
            ls.add_route_operator(op)

    algo = hgspy.GeneticAlgorithm(params, rng, pop, ls)

    crossover_ops = [
        hgspy.crossover.broken_pairs_exchange,
        hgspy.crossover.selective_route_exchange,
    ]

    for op in crossover_ops:
        if op not in excl_ops:
            algo.add_crossover_operator(op)

    if max_runtime is not None:
        stop = hgspy.stop.MaxRuntime(max_runtime)
    else:
        stop = hgspy.stop.MaxIterations(max_iterations)

    return algo.run(stop)

instance = tools.read_vrplib("instances/ORTEC-VRPTW-ASYM-5fa16580-d1-n329-k25.txt")
res = solve_static(instance, max_iterations=25)
best = res.get_best_found()
routes = [route for route in best.get_routes() if route]
cost = best.cost()

print(cost, routes)

And raises a memory allocation error for routes (see prev comment)

leonlan commented 2 years ago

FYI: flake8 is now also added to the CI. So in addition to running black, it's now required to run flake8 on the codebase and to address these errors to get a pass on the CI.

jmhvandoorn commented 2 years ago

This is a much needed re-organization of the code base. Overall I think the proposed changes are in the right direction. I think one particular point that we need to discuss in more detail is how to manage all the different solver-configs and operators, so that it can also be tuned more easily.

Agreed. I have some suggestion for that. I will elaborate on that later. To summarize, I would suggest a config file for each of the three cases you mention. + the possibility to pass a different config file name as argument for testing / tuning

N-Wouda commented 2 years ago

The commit 8958291, that moves the solver into a function that returns the result, raised some issues. Possibly related to #95

https://github.com/N-Wouda/Euro-NeurIPS-2022/blob/386f07e1859988c1d52f20a5360152267e4f8420/benchmark.py#L42-L45
"solve_dynamic.py", line 29, in solve_epoch
    routes = [route for route in best.get_routes() if route]
RuntimeError: Could not allocate list object!
Somehow it seems that we cannot always access this value anymore after the function containing the solver gets out of scope.

The good news is that this is unrelated to #95. This is just how cpp object lifetimes work. Result takes a reference to the best individual. That reference is by definition not owning - the individual remains owned by Population. When the Population goes out of scope, it deletes all its owned individuals, causing the reference to become invalid. Thus accessing data from that now-meaningless individual does not work anymore.

We cannot transfer ownership, so the only way around this is to either copy the individual into Result, or keep the population in scope when accessing the solver results. I'm leaning towards copying, because that's easy to do.

N-Wouda commented 2 years ago

Since it might be quite a change, I am very curious what you think about it.

My initial take on this is that I'm happy we're removing most of the other baseline strategies. I'm unhappy about the lack of cohesion in this PR - it's doing too many things at once. From what I can tell, those things include (at least): removing baselines, changing config management, and removing various duplications. All those should have been their own PRs.

I will continue on this refactoring later today or tomorrow.

This PR is big as-is. Can we work towards getting this thing merged in (possibly in reduced form), rather than adding additional changes here? A sequence of more focused PRs is much more manageable for me (and, I think, Leon) to think about and review.

leonlan commented 2 years ago

I agree with @N-Wouda. I suggest to split this PR into the parts that Niels suggested:

removing baselines, changing config management, and removing various duplications.

In general, it would also be very helpful if you make issues for each of the proposed changes before you open a PR. That way we can all discuss about what needs to be done and have more effective reviews. Ideally, the issues should be the place for discussions about what and how to change, and then in the PR we can focus on specific implementation details.

jmhvandoorn commented 2 years ago

I agree with @N-Wouda. I suggest to split this PR into the parts that Niels suggested:

removing baselines, changing config management, and removing various duplications.

In general, it would also be very helpful if you make issues for each of the proposed changes before you open a PR. That way we can all discuss about what needs to be done and have more effective reviews. Ideally, the issues should be the place for discussions about what and how to change, and then in the PR we can focus on specific implementation details.

Agreed, sorry for that. Has something todo with this being my first ever project using the full scope of possibilities in github 😅

I would propose the following for now:

Keep this PR for the discussions until now.
I will make an issue for the things we need to discuss further.
I will make seperate branches & pr's to get them in one by one.

Do you agree that is the right way forward now?

leonlan commented 2 years ago

Do you agree that is the right way forward now?

Sounds good! :-)

N-Wouda / Euro-NeurIPS-2022

[OUTDATED] Major python refactoring #108