Closed noraderam closed 8 years ago
I'm not a huge fan of the change to _without_assigned
because it makes the name somewhat incorrect. We could always fix the name, of course. As a lesser point, I'd also be interested in the performance of this; I think it's called fairly often.
Semantically, what does this do? I suppose the solver will never assign the id, since it has no clauses to cause unit propagation nor will it ever be assumed by the policy. What do we do in that case? Hopefully just not touch the package?
It seems like prepopulating id_to_clauses
using from_keys(<pool ids>, [])
would be another solution that doesn't change the meaning of anything or change the performance significantly. Is that feasible?
Semantically, what does this do? I suppose the solver will never assign the id, since it has no clauses to cause unit propagation nor will it ever be assumed by the policy. What do we do in that case? Hopefully just not touch the package?
The solver will completely ignore the package; we do nothing with it. In practice, that probably means leaving it installed, but it depends on what we're actually doing with the output of the solver. This is specifically happening in a case where we did not ask the solver to keep the package installed.
It seems like prepopulating id_to_clauses using from_keys(
, []) would be another solution that doesn't change the meaning of anything or change the performance significantly. Is that feasible?
In that case, we still suggest the package as a potential assignment for our solver. That seems wrong to me. We're suggesting a literal for which we have no clauses because of the implementation detail that our policy
is given an installed_repository
. It presupposes that we must want packages that we already have installed. We shouldn't assume that: that's why we add install
jobs to the Request
for installed packages.
The policy's original name was actually "installed first policy." It was changed when we were trying out another strategy that was also "installed first" but implemented differently.
The policy is designed to suggest packages from the installed repository first unless you specifically ask it not to via the prefer installed kwarg. Otherwise every solution it offers will typically involve a lot of uninstallations just because it satisfies the clauses before it happens to suggest the currently installed packages.
We used to accidentally rely on this, prior to https://github.com/enthought/edm/issues/319. Before that patch, the marked packages were mostly maintained because we intentionally try to preserve as much of the current state as possible, which is done by making sure that we first try solutions that keep installed packages. We actually can't emulate that behavior strictly with rules because we don't have a way to express preferences, only hard requirements. The preference information is the behavior of the Policy. That is what the policy is all about.
Does this make sense? I'm on a phone in an airport lounge so it's tricky to put together a good response :)
I'll be able to check back later today.
The policy is designed to suggest packages from the installed repository first unless you specifically ask it not to via the prefer installed kwarg.
This makes perfect sense to me as long as the packages it suggests have clauses associated with them. I still don't understand why we would want the policy to suggest other unknown packages just because they fell into the "previously installed" bucket. That does us no good in satisfying our problem.
as long as the packages it suggests have clauses associated with them
What kind of clauses qualify as sensible here? For this to be relevant, we must be talking about clauses that don't force installation in some way, otherwise they will have to be assigned before termination and the solver isn't going to coincidentally finish before that happens. So we're only talking about clauses that specify relationships between packages, such as a conflict or dependency. To me, those kinds of clauses don't make the package any more or less important and their absence shouldn't change whether or not the policy suggests them. So if it makes sense to suggest installed packages first if they have an associated clause, it should be because it makes sense to suggest installed packages first, full stop.
My impression is that you object to something in that paragraph, but I don't follow. Can you elaborate?
So we're only talking about clauses that specify relationships between packages, such as a conflict or dependency. To me, those kinds of clauses don't make the package any more or less important and their absence shouldn't change whether or not the policy suggests them.
We are attempting to solve a satisfiability problem. We suggest literals and try them out until we have an assignment for everything in our clauses. Suggesting a package which never appears in a clause does not help us; it's effectively a no-op. Why would we suggest it? Am I misunderstanding how this actually works?
What you're saying is half true in the sense that a generalized solver that is interested in finding any solution as quickly as possible will not do this. However, by manipulating the policy that the solver uses, we steer it towards "good" solutions rather than just fast solutions. In this case, the rationale for suggesting these packages is exactly the same as for the manual marked packages except for the fact that for auto marked packages we do not require that the solution includes them, we simply prefer it. Be using a policy that suggests these packages first, we are ensuring that solutions which include already installed packages are tried first, but not failing entirely if there isn't one
Keep in mind that it's not enough to just let the solver run and ignore packages that don't get assignments because solutions will be found that require uninstalling packages, even when another solution exists. By doing it this way, we do a much better job of finding solutions that are as similar as possible to the current state (at the cost of not pushing everything to the latest version all the time.)
@johntyree Do you know of any situation in which your proposal and mine result in different Transactions
? They seem to be equivalent in their results. While in one case we propose installing a package that is already installed (and which was unrelated to the set of clauses), the Transaction
ignores that package (https://github.com/enthought/sat-solver/blob/master/simplesat/transaction.py#L112-L114).
it's not enough to just let the solver run and ignore packages that don't get assignments because solutions will be found that require uninstalling packages, even when another solution exists.
I'm not totally sure what you mean by this. I'm not proposing changing the general practice of preferring installed packages; I'm only looking at the case where a package is unrelated to the set of clauses we're attempting to resolve. It's worth noting that if installing such a package would impact the rest of the solution because of dependencies or conflicts, we would have generated a rule/clause for it.
I think they are likely equivalent and agree with your reasoning in the last point. Personally, I'd prefer the simpler policy of just suggesting installed packages first, regardless of whether or not they have complex relationships with other packages. To me it breaks down like this: we want to suggest installed packages first for the reasons mentioned before. We have the entire list of those packages available ahead of time. This bug comes from the fact that we are reverse engineering that list from the assumption that every package is in at least one clause, which is false. Your proposed solution seems to be to check each package and make sure it doesn't violate that assumption before suggesting it. I'm proposing that we ditch the assumption entirely and just build the complete dict up front. On May 25, 2016 21:54, "Nora Deram" notifications@github.com wrote:
it's not enough to just let the solver run and ignore packages that don't get assignments because solutions will be found that require uninstalling packages, even when another solution exists.
I'm not totally sure what you mean by this. I'm not proposing changing the general practice of preferring installed packages; I'm only looking at the case where a package is unrelated to the set of clauses we're attempting to resolve. It's worth noting that if installing such a package would impact the rest of the solution because of dependencies or conflicts, we would have generated a rule/clause for it.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/enthought/sat-solver/pull/218#issuecomment-221601893
I'm proposing that we ditch the assumption entirely and just build the complete dict up front.
Minor point, but we would need to build that into the dict each time we make a new assignment, when we call _build_id_to_clauses
.
To me it breaks down like this: we want to suggest installed packages first for the reasons mentioned before.
This is still the point of my confusion. Suggesting a package which does not contribute to the solution does not help us. While it doesn't actually impact the resulting Transaction
it still feels wrong to me. However, it doesn't hurt. If you think this is is a reasonable way to use the policy and you don't have the same feeling of wrongness, I'm happy to trust your experience in the code.
If we are careful to ensure that unassigned packages are not touched at all, then I think we are OK. However, to me, that feels like we're inventing a special case for no good reason. We really honestly don't care which packages are involved in complex relationships other than to ensure that they are satisfied. What we want is to find solutions with minimal changes to installed packages. Suggesting installed packages first achieves that as directly as we currently can in a way that I think is least surprising.
If we want to change that, then I suggest renaming the reexport called InstalledFirsrPolicy
since there will be installed packages that are not suggested first. I don't think either approach is too unreasonable. You make the call. You're the one that needs to understand and be comfortable with these details now! :)
I've updated this based on @johntyree's suggestion. This seems to be how the policy
was conceived to work and will not cause any improper behavior. @cournape, can you review?
One question
:+1:
Closes #217
@johntyree I'd love to hear your thoughts on this and the corresponding issue. I believe that this solution is correct.