FIX: Do not suggest literals without clauses

noraderam commented 8 years ago

Closes #217

@johntyree I'd love to hear your thoughts on this and the corresponding issue. I believe that this solution is correct.

johntyree commented 8 years ago

I'm not a huge fan of the change to _without_assigned because it makes the name somewhat incorrect. We could always fix the name, of course. As a lesser point, I'd also be interested in the performance of this; I think it's called fairly often.

Semantically, what does this do? I suppose the solver will never assign the id, since it has no clauses to cause unit propagation nor will it ever be assumed by the policy. What do we do in that case? Hopefully just not touch the package?

It seems like prepopulating id_to_clauses using from_keys(<pool ids>, []) would be another solution that doesn't change the meaning of anything or change the performance significantly. Is that feasible?

noraderam commented 8 years ago

Semantically, what does this do? I suppose the solver will never assign the id, since it has no clauses to cause unit propagation nor will it ever be assumed by the policy. What do we do in that case? Hopefully just not touch the package?

The solver will completely ignore the package; we do nothing with it. In practice, that probably means leaving it installed, but it depends on what we're actually doing with the output of the solver. This is specifically happening in a case where we did not ask the solver to keep the package installed.

It seems like prepopulating id_to_clauses using from_keys(, []) would be another solution that doesn't change the meaning of anything or change the performance significantly. Is that feasible?

In that case, we still suggest the package as a potential assignment for our solver. That seems wrong to me. We're suggesting a literal for which we have no clauses because of the implementation detail that our policy is given an installed_repository. It presupposes that we must want packages that we already have installed. We shouldn't assume that: that's why we add install jobs to the Request for installed packages.

johntyree commented 8 years ago

The policy's original name was actually "installed first policy." It was changed when we were trying out another strategy that was also "installed first" but implemented differently.

The policy is designed to suggest packages from the installed repository first unless you specifically ask it not to via the prefer installed kwarg. Otherwise every solution it offers will typically involve a lot of uninstallations just because it satisfies the clauses before it happens to suggest the currently installed packages.

We used to accidentally rely on this, prior to https://github.com/enthought/edm/issues/319. Before that patch, the marked packages were mostly maintained because we intentionally try to preserve as much of the current state as possible, which is done by making sure that we first try solutions that keep installed packages. We actually can't emulate that behavior strictly with rules because we don't have a way to express preferences, only hard requirements. The preference information is the behavior of the Policy. That is what the policy is all about.

Does this make sense? I'm on a phone in an airport lounge so it's tricky to put together a good response :)

I'll be able to check back later today.

noraderam commented 8 years ago

The policy is designed to suggest packages from the installed repository first unless you specifically ask it not to via the prefer installed kwarg.

This makes perfect sense to me as long as the packages it suggests have clauses associated with them. I still don't understand why we would want the policy to suggest other unknown packages just because they fell into the "previously installed" bucket. That does us no good in satisfying our problem.

johntyree commented 8 years ago

as long as the packages it suggests have clauses associated with them

What kind of clauses qualify as sensible here? For this to be relevant, we must be talking about clauses that don't force installation in some way, otherwise they will have to be assigned before termination and the solver isn't going to coincidentally finish before that happens. So we're only talking about clauses that specify relationships between packages, such as a conflict or dependency. To me, those kinds of clauses don't make the package any more or less important and their absence shouldn't change whether or not the policy suggests them. So if it makes sense to suggest installed packages first if they have an associated clause, it should be because it makes sense to suggest installed packages first, full stop.

My impression is that you object to something in that paragraph, but I don't follow. Can you elaborate?

noraderam commented 8 years ago

So we're only talking about clauses that specify relationships between packages, such as a conflict or dependency. To me, those kinds of clauses don't make the package any more or less important and their absence shouldn't change whether or not the policy suggests them.

We are attempting to solve a satisfiability problem. We suggest literals and try them out until we have an assignment for everything in our clauses. Suggesting a package which never appears in a clause does not help us; it's effectively a no-op. Why would we suggest it? Am I misunderstanding how this actually works?

johntyree commented 8 years ago

What you're saying is half true in the sense that a generalized solver that is interested in finding any solution as quickly as possible will not do this. However, by manipulating the policy that the solver uses, we steer it towards "good" solutions rather than just fast solutions. In this case, the rationale for suggesting these packages is exactly the same as for the manual marked packages except for the fact that for auto marked packages we do not require that the solution includes them, we simply prefer it. Be using a policy that suggests these packages first, we are ensuring that solutions which include already installed packages are tried first, but not failing entirely if there isn't one

johntyree commented 8 years ago

Keep in mind that it's not enough to just let the solver run and ignore packages that don't get assignments because solutions will be found that require uninstalling packages, even when another solution exists. By doing it this way, we do a much better job of finding solutions that are as similar as possible to the current state (at the cost of not pushing everything to the latest version all the time.)

noraderam commented 8 years ago

@johntyree Do you know of any situation in which your proposal and mine result in different Transactions? They seem to be equivalent in their results. While in one case we propose installing a package that is already installed (and which was unrelated to the set of clauses), the Transaction ignores that package (https://github.com/enthought/sat-solver/blob/master/simplesat/transaction.py#L112-L114).

noraderam commented 8 years ago

it's not enough to just let the solver run and ignore packages that don't get assignments because solutions will be found that require uninstalling packages, even when another solution exists.

I'm not totally sure what you mean by this. I'm not proposing changing the general practice of preferring installed packages; I'm only looking at the case where a package is unrelated to the set of clauses we're attempting to resolve. It's worth noting that if installing such a package would impact the rest of the solution because of dependencies or conflicts, we would have generated a rule/clause for it.

johntyree commented 8 years ago

I think they are likely equivalent and agree with your reasoning in the last point. Personally, I'd prefer the simpler policy of just suggesting installed packages first, regardless of whether or not they have complex relationships with other packages. To me it breaks down like this: we want to suggest installed packages first for the reasons mentioned before. We have the entire list of those packages available ahead of time. This bug comes from the fact that we are reverse engineering that list from the assumption that every package is in at least one clause, which is false. Your proposed solution seems to be to check each package and make sure it doesn't violate that assumption before suggesting it. I'm proposing that we ditch the assumption entirely and just build the complete dict up front. On May 25, 2016 21:54, "Nora Deram" notifications@github.com wrote:

it's not enough to just let the solver run and ignore packages that don't get assignments because solutions will be found that require uninstalling packages, even when another solution exists.

I'm not totally sure what you mean by this. I'm not proposing changing the general practice of preferring installed packages; I'm only looking at the case where a package is unrelated to the set of clauses we're attempting to resolve. It's worth noting that if installing such a package would impact the rest of the solution because of dependencies or conflicts, we would have generated a rule/clause for it.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/enthought/sat-solver/pull/218#issuecomment-221601893

noraderam commented 8 years ago

I'm proposing that we ditch the assumption entirely and just build the complete dict up front.

Minor point, but we would need to build that into the dict each time we make a new assignment, when we call _build_id_to_clauses.

To me it breaks down like this: we want to suggest installed packages first for the reasons mentioned before.

This is still the point of my confusion. Suggesting a package which does not contribute to the solution does not help us. While it doesn't actually impact the resulting Transaction it still feels wrong to me. However, it doesn't hurt. If you think this is is a reasonable way to use the policy and you don't have the same feeling of wrongness, I'm happy to trust your experience in the code.

johntyree commented 8 years ago

If we are careful to ensure that unassigned packages are not touched at all, then I think we are OK. However, to me, that feels like we're inventing a special case for no good reason. We really honestly don't care which packages are involved in complex relationships other than to ensure that they are satisfied. What we want is to find solutions with minimal changes to installed packages. Suggesting installed packages first achieves that as directly as we currently can in a way that I think is least surprising.

If we want to change that, then I suggest renaming the reexport called InstalledFirsrPolicy since there will be installed packages that are not suggested first. I don't think either approach is too unreasonable. You make the call. You're the one that needs to understand and be comfortable with these details now! :)

noraderam commented 8 years ago

I've updated this based on @johntyree's suggestion. This seems to be how the policy was conceived to work and will not cause any improper behavior. @cournape, can you review?

cournape commented 8 years ago

One question

cournape commented 8 years ago

:+1:

enthought / sat-solver

FIX: Do not suggest literals without clauses #218