Open wolfv opened 2 years ago
What happens currently is that mamba "randomly" picks cuda 11.1 (even though cuda 11.2 would probably be preffered).
I don't think the issue is here. Mamba should be able to handle this instead.
I don't think the issue is here. Mamba should be able to handle this instead.
As long as we're not compromising on the quality of the packaging (i.e. everything works as it should), I don't see why we shouldn't accommodate mamba here. Yes, it would be nicer if it were fixed in mamba, but it's used more and more (not least in CF-CI), so keeping things knowingly suboptimal has to have a commensurate benefit for not doing it (especially as the workaround is really not painful), IMO.
@183amir the problem in this case is that we have a variant package (tensorflow
) that restricts the cuda version of downstream packages.
But we have no way of choosing which tensorflow
package is the best without doing a global optimization.
We could change this in the following ways:
tensorflow-base 2.6 (pyXX)
and variant selection would work fineGlobal optimization, with the amount of variant packages that exist in conda-forge, takes a looong time (because many branches have to be explored). I think it's good to make it as simple as possible for the solver to find the best solution.
I tried to write down how the sorting and solving works here: https://mamba.readthedocs.io/en/latest/advanced_usage/package_resolution.html
As long as we're not compromising on the quality of the packaging (i.e. everything works as it should), I don't see why we shouldn't accommodate mamba here.
I agree.
have only tensorflow pyXX meta-package (without CUDA, or even without Python!), then the tensorflow packages depends on tensorflow-base 2.6 (pyXX) and variant selection would work fine
I don't think this is a good solution because the exact dependency on tensorflow-base is there to make sure the split packages are all installed from one build. That is, tensorflow, tensorflow-estimator, etc. are installed from one build.
Add a direct dependency to the same cudatoolkit from the build string, then variant selection would also work fine :)
I am not sure what this means. Could you please explain more? Maybe you could show how the recipe would look like.
This might be a separate topic, but do we actually need both tensorflow
and tensorflow-base
packages? Currently the only difference between mamba create -d -n test "tensorflow=*=cuda112*"
and mamba create -d -n test "tensorflow-base=*=cuda112*"
is that the first one installs tensorflow and tensorflow-estimator. The same is true for "tensorflow-base=*=cuda112*"
vs. "tensorflow-base=*=cuda112*"
. Unless there is a common need to install tensorflow without tensorflow-estimator that doesn't make much sense to me.
So right now we have the following situation:
tensorflow-py39-cuda112
depends on
- python 3.9
- tensorflow-base py39-cuda112 (heavily constrained, can only use a single tensorflow-base!)
tensorflow-py38-cuda112
- depends on python 3.8
- tensorflow-base py38-cuda112
tensorflow-py39-cuda111
- depends on python 3.8
- tensorflow-base py39-cuda111
So here, mamba can correctly decide for tensorflow-py39...
but cannot select the cuda variant because it cannot judge by inspecting the first order dependencies.
If we add cudatoolkit at this level, it woudl work fine. Otherwise, we could modify it like this:
tensorflow-py39
- depends on python 3.8
- tensorflow-base py39*
can be resolved to
- tensorflow-base py39-cuda102
- tensorflow-base py39-cuda110
- tensorflow-base py39-cuda111
- tensorflow-base py39-cuda112
Then again we could resolve properly, because we can decide again for tensorflow-base and select the one that uses the highest cudatoolkit.
I don't know if this makes sense? It's kinda hard to explain.
Basically with the current setup we would need to look at all tensorflow
packages, and then at all tensorflow-base
packages to figure out which one is best.
What I am proposing is to
tensorflow
(by adding the cudatoolkit as dependency)tensorflow-base
(by removing the exact pin)Eliminating tensorflow-base
would also fix this :)
I kinda feel like we should add cuda-toolkit to tensorflow
. I'm not too sure why tensorflow-base
still exists today. I think it was historically there to help with circular dependencies with estimator
I think it was historically there to help with circular dependencies with estimator
Yes!
I'm sorry, do we want to add :
cudatoolkit
or cudnn
nccl
to the host section
cudatoolkit is the one whose version is in the variant, right? so that should be enough to help mamba decide
I guess i don't see cudatoolkit
anywhere in this recipe or in that of the pytorch recipe. I feel like we should make this recipe a model to follow for other recipes.
I think it's exported from compiler('cuda')
?
great. makes sense. let's add that.
it seems like we should be adding it to libtensorflow_cc
as well right?
Ok, I've folded it in the recent migraiton.
IIUC, we could get rid of tensorflow-base
now that the estimator is folded into this feedstock?
How about we do this for 2.7.0?
we could do that.... if downstream packages are not depending on it.
There seems to be quite a few things on github that are now depending on it already. So we kinda have to carry it forward.
If it's still needed, we could still keep it as a compatibility output that just 1:1 installs tensorflow
, ideally together with an activation script that warns to change the dependence away from tensorflow-base
.
We had tried to keep tensorflow
as simply installing tensorflow-base
exactly pinned, but we needed to add more constraints to it.
I'm not sure there is an easy answer to this.
If we don't need tensorflow-base
anymore to break circular dependencies with tf-estimator, then we can just fully absorb it into tensorflow
in the recipe, and then add a new output tensorflow-base
as a thin wrapper around tensorflow
. I can't see how that wouldn't work...
In your sentence, swap tensorflow-base
and tensorflow
and that is what we had 2 weeks ago.
I introduced tensorflow-base
only to get tensorflow-estimator
building inside the feedstock here. Will still need that or do we have a workaround for the cyclic dependency?
In your sentence, swap
tensorflow-base
andtensorflow
and that is what we had 2 weeks ago.
You said above:
We had tried to keep
tensorflow
as simply installingtensorflow-base
exactly pinned, but we needed to add more constraints to it.
Why not still add those constraints and then call it tensorflow
/ tensorflow-base
(whichever way the wrapper goes)?
Maybe i'm just confused. It might be just easier if you show me in a PR.
Can we agree on an order of priorities? I suggest:
Can we agree on an order of priorities?
I agree that #144 is priority number 1, and as I said I'm fine to clean up the tf-base situation together with 2.7.0. I don't think we should make these improvements dependent on windows builds, which has a large number of known & unknown unknowns.
IOW, from your list, I suggest 1. -> 3. -> 2.
has this been resolved?
Mamba currently can't properly decide which
tensorflow
cuda package to prefer since we don't directly depend oncudatoolkit
in thetensorflow
package.Only in the
tensorflow-base
package we have acudatoolkit
dependency.However,
tensorflow-base
is connected totensorflow
via an exact dependency.If we'd also add the
cudatoolkit
to thetensorflow
package, mamba would be able to select the best version.OR we could remove the tensorflow variant completely and just have a single metapackage that depends on
tensorflow-base {{ VERSION }}
(but not exact). Then mamba could decide the tensorflow-base version more freely.