Add the Python instance generator from Gasse et al. 2019

ds4dm / ecole

Extensible Combinatorial Optimization Learning Environments

https://www.ecole.ai

BSD 3-Clause "New" or "Revised" License

326 stars 68 forks source link

Add the Python instance generator from Gasse et al. 2019 #82

Closed AntoinePrv closed 4 years ago

AntoinePrv commented 4 years ago

Describe the problem or improvement suggested

We want to have the instance generator from Exact combinatorial optimization with graph convolutional neural networks Gasse et al. 2019.

Describe the solution you would like

The first implementation will be Python only in order to megre the generators quickly. This is a first implementation, independent from the rest of Ecole, so there are little constraints on the API.

Describe alternatives you have considered

The API is subject to change when merging the generators to C++ in #83

Additional context

Moving Python code to Ecole will give us visibility on the final API for instance generators.

jdumouchelle commented 4 years ago

Here is an initial proposal of the inclusion of the instance generators into ecole.

Proposed Implementation

A parent class InstanceGenerator with a child class for each type of instance generation (setcover/cauction/...). The following methods would be included:

generate_instance: abstract method which needs to be implemented for each problem type (setcover/cauction/...). This would either write to a file or return a model.
next: iterator which calls generate_instance and returns an ecole model. This would also include functionality to save the instance so it does not need to be done for every instance generator.
seed: seeds random number generator.

In the child classes we would only need to implement the constructor and generate_instance method so it should allow users to easily add more generators if they desire. This should also provide an easy transition to the c++ implementation.

For the usage in python we would have the following.

In the case of not saving instances:

instances = ecole.InstanceGenerators.CauctionsGenerator(n_items=100, n_bids=100, add_item_prob=0.7)
model = next(instances)
env.reset(model)

In the case of saving instances:

instances = ecole.InstanceGenerators.CauctionsGenerator(n_items=100, n_bids=100, add_item_prob=0.7, save_dir="./cauction_instances/")
model = next(instances)
env.reset(model)

Considerations

Yesterday we discussed writing the LP to a file or creating a pyscipopt model and using add constraint/var. If we go with either of these then it would not change what the user of ecole does, however it would change how we implement generate_instance as we would either need to write a .lp file or create a pyscipopt model. Overall, my preference is towards a writing the .lp file since it is already implemented by Maxime and will likely be easier when transitioning to the c++ implementation. It could be the case that it is faster to use the pyscipopt model and add constraints, rather than writing them all to a file and reading from it, so looking into the difference may be worth it.

Another consideration to make is regarding how we want the files to be saved. In the above description we would simply take the directory and enumerate files each time next is called, however this does not provide a lot of flexibility to the users. Alternatively, we could allow the user to have more control and specify the path to the LP when calling next, however this would no longer be an iterator as it is parameterized, so the python interface would change to something like:

instances = ecole.InstanceGenerators.CauctionsGenerator(n_items=100, n_bids=100, add_item_prob=0.7)
model = instances.next("./cauction_instances/instance0.lp")
env.reset(model)

AntoinePrv commented 4 years ago

Thanks for taking the time to share the plan of your implementation. Sounds good overall. I have a few comments though.

A parent class InstanceGenerator It seems to me this class does nothing more than create a namepace ecole.InstanceGenerators.[...], correct? In that case, I would simplify it further by having a (folder) module ecole/instances, in which we would have one class per type of instance generator. We could still have InstanceGenerators as a pure Abstract Base Class abc.ABC
Regarding writing instances to files. I am strongly opposed to using this as the default option. IMHO, even regardless of speed considerations, it introduce unnecessary complexity. It means that by simply running Ecole has sides effects on the user system. It may leave undeleted files. We have to make sure that files are properly deleted if the program is interrupted abruptly... It is also not clear that it will always work. We don't know how users are going to run Ecole: In Docker? In Compute Canada? There are lot of non trivial cases where this could be a problem, and not the type that users will have a good time debugging. It will also be harder to move to C++. While Python might have a tempfile that manages many aspects for us, I don't believe we have such nicety in C++.

However, it is still nice option to be able to log generated files. To me it would be an option like:
```
ecole.InstanceGenerators.CombinatorialAuctions(..., save_instances="some_folder")
```
So to recap, I would:
1. Unconditionally generate instances using a pyscipopt.Model
2. If self.save_instance is not None, save the file under f"{save_instance}/{self.class.__name__}_{i}.lp"
3. Return the pyscipopt.Model as a ecole.Model That way the logging of files is a simple option disentangled from the core implementation.
Generator parameters This is very general and possibly too early to answer, but do we want to constraint generator a given size (e.g. n_item=100)? @gasse @dchetelat Are there cases where we want to to generate instances say between 50 and 150 items? This would introduce much more complexity, because now we would have a probability distribution (the generator) parameterize by other probabilities distribution (the one on the number of items, bids...)

gasse commented 4 years ago

Yes thank you @jdumouchelle for starting the discussion.

I like the general API.

Regarding writing to files, I agree with @AntoinePrv , it might be convenient for us to simply write to LP files, but from a software perspective messing up with the filesystem under the hood is probably not very reasonable. A workaround maybe could be to have our own data structure for MILPs, which we can either write to LP files, or load into a SCIP model. Or we can simply consider that the SCIP storage IS our data structure, that's maybe the most reasonable thing to do. And we can even ask SCIP to write the problem to an LP file for us if we want to. Regarding the fact that the existing code has to be re-written with SCIP calls to add variables/constraints etc, I agree it will involve some work but I believe it is remains reasonable.

About the generator parameters, that's a goods point @AntoinePrv . I would say yes, we want sometimes to generate instances in a range of say 50-150 variables or whatever. What about, in addition to the currently proposed API, allowing the users to feed the generator by using their own parameter generator ? We could provide none of those, just show how that is done in a tutorial example. The generator would produce an **args dictionary objects, which would then be fed to the generate_instance function. Maximum flexibility for the users, minimum effort for us :P

Also, I see an issue coming up. How much of this stuff do we want in the C++ API ? Do we also want instance generators, following the same API ? Do we only want to put the underlying implementation in C++, and only provide the API in Python ? I see the effort that it required to have the Ecole API consistent in C++ and Python, and the tricks that we used (duplication of the Environment class). Would the same approach be worth the effort here ?

AntoinePrv commented 4 years ago

What about, in addition to the currently proposed API, allowing the users to feed the generator by using their own parameter generator ? We could provide none of those, just show how that is done in a tutorial example.

Actually I believe the best thing is to leave it as it is. The hard part is to generate an instance for a given size. Then it is easy to combine multiple generators (we could indeed show it in the doc indeed). However predicting what users want to do is very hard ^^

How much of this stuff do we want in the C++ API

All of it I guess. IMHO the C++ is not only for speed, it is also meant to provide a full API.

Do we only want to put the underlying implementation in C++

I'd say yes.

and the tricks that we used (duplication of the Environment class). Would the same approach be worth the effort here ?

As of now, I do not expect that this would be required here. The environment class is quite complicated in term of customization (if you change the ObservationFunction then it changes the return type of step because the observation changes). I don't see that happening here.

jdumouchelle commented 4 years ago

Thanks for the feedback @AntoinePrv and @gasse. After reading your comments I agree that we should avoid writing to files (unless the user requests log them in a specific directory). I have changed it to directly add the constraints and variables to a pyscipopt.Model.

A parent class InstanceGenerator It seems to me this class does nothing more than create a namepace ecole.InstanceGenerators.[...], correct? In that case, I would simplify it further by having a (folder) module ecole/instances, in which we would have one class per type of instance generator. We could still have InstanceGenerators as a pure Abstract Base Class abc.ABC

In this case I think having a parent class may be somewhat useful. For example the method which handles logging files can be implemented by the parent class, which would allow us to avoid implementing it in each instance generator. Although since logging files is quite easy it could be implemented in each class if you would prefer to not have the parent class.

What about, in addition to the currently proposed API, allowing the users to feed the generator by using their own parameter generator ? We could provide none of those, just show how that is done in a tutorial example.

Actually I believe the best thing is to leave it as it is. The hard part is to generate an instance for a given size. Then it is easy to combine multiple generators (we could indeed show it in the doc indeed). However predicting what users want to do is very hard ^^

I agree with Antoine here. I think the difference between creating multiple generators and feeding a dict to generate_instance would be pretty small on the end of the user.

Anyway, I am mostly finished with the implementation at this point so if we do not have any major disagreements, then I will try to complete it and make a pull request early next week.

dchetelat commented 4 years ago

Sorry for taking so long to respond, I have too many things in parallel and I forgot about it.

I thought about it and I think my favorite API would look at bit like this in python.

# Generator has default values, but they can be overwritten
instances = ecole.instances.CauctionsGenerator(n_items=100, n_bids=100, add_item_prob=0.7) 

instance = next(instances) # Is a python generator

instance.save("cauctions/instance_0.lp") # Optional step

env.reset(instance)

Now, what instance would be of object of, I am not sure. But from the perspective of a user, what I think would be the simplest, cleanest, most natural API, and would make it very natural whether you want to save the instance to file or not.

What about, in addition to the currently proposed API, allowing the users to feed the generator by using their own parameter generator ? We could provide none of those, just show how that is done in a tutorial example. The generator would produce an **args dictionary objects, which would then be fed to the generate_instance function. Maximum flexibility for the users, minimum effort for us :P

Can you give a code sample of what that would look like? I'm not sure I understand what you have in mind.

jdumouchelle commented 4 years ago

# Generator has default values, but they can be overwritten
instances = ecole.instances.CauctionsGenerator(n_items=100, n_bids=100, add_item_prob=0.7) 

instance = next(instances) # Is a python generator

instance.save("cauctions/instance_0.lp") # Optional step

env.reset(instance)

I agree that this is a pretty nice solution since it gives users more flexibility and remains quite intuitive. I think instance would ideally be an ecole model since it would allow users avoiding any additional steps before passing it to the environment (i.e. env.reset(instance)). So if we have a method similar to writeProblem in pyscipopt, then we could achieve this pretty easily. It also might just be convenient in general to have functionality in an ecole model to write to a .lp file so maybe we should consider adding this to the API. Alternatively, the user could just call instance.as_pyscipopt().writeProblem("cauctions/instance_0.lp") , but this seems less intuitive to me.

gasse commented 4 years ago

I like also @dchetelat 's suggested API. As for the parameter generator, I was thinking something like this


class MyParameterGenerator:
    def __init__(self):
        self.rng = np.random.RandomState()

    def __iter__(self):
        return self

    def __next__(self):
        return {
            n_items: self.rng.randint(80, 121),  # uniform integer in [80, 120] interval
            n_bids: self.rng.randint(80, 121),  # uniform integer in [80, 120] interval
            add_item_prob: (0.8-0.6) * self.rng.random_sample() + 0.6}  # uniform float in [0.6, 0.8] interval

    def seed(self, seed):
        self.rng.seed(seed)

# Generator has default values, but they can be overwritten
instances = ecole.instances.CauctionsGenerator(parameter_generator=MyParameterGenerator()) 

instance = next(instances) # Is a python generator

instance.save("cauctions/instance_0.lp") # Optional step

env.reset(instance)

And basically the instance generator will fetch the next tuple of parameters to use for each new instance.

dchetelat commented 4 years ago

Great, I'm glad you guys like it!

As for the parameter generator, I was thinking something like this [...]

Oh yeah, makes total sense! I think that would be very useful to allow for that.

AntoinePrv commented 4 years ago

instance.save("cauctions/instance_0.lp") # Optional step

I personally do not like that bit. Here it looks simple, but in practice you are not writing to the same file every time 😅
So you would need to initialize a counter, increment it... IMHO readable code does not mix different levels of abstraction. The environment, actions, observations, reward Functions... are super high level. Saving and counting instances not the same.

That being said, we have (almost/soon) the machinery to offer that functionality out of the generator. We could have an Event (#71) (bunch of callbacks like a reward function, but without returning a reward) to log the problem file before or after presolve, log the solution, ... And that would work regardless of where the instance came from.

Not a big deal, but I would omit that. We should make it an iterator, or an iterable, but not both

    def __iter__(self):
        return self