conda-forge / openmm-feedstock

A conda-smithy repository for openmm.
BSD 3-Clause "New" or "Revised" License
7 stars 16 forks source link

Build matrix for downstream projects #50

Open jlmaccal opened 3 years ago

jlmaccal commented 3 years ago

I'm working on porting our build/packages process for meld to conda-forge, which is currently in stage-recipes awaiting review.

One thing I am having a hard time understanding is how downstream packages, like meld, should determine their build matrices. Part of this is not fully understanding the machinery of conda and conda-forge.

My understanding is that the jobs in .ci_support are created automatically by conda-smithy upon re-rendering. I can see that for opennmm-feedstock the sets of cuda versions, CDTs, and machine images are coming from conda_build_config.yaml.

What I don't understand is how the build matrix is determined for downstream pacakges. I've looked at openmm-torch-feedstock and openmm-plumed-feedstock. Neither of these has a conda_build_config. Furthermore, they aren't built against the same sets of cuda versions. Is the build matrix just determined from openmm's at the time the downstream package is re-rendered? Or am I missing something?

What is the correct way to handle this in meld? I would like to build against the same versions of cuda that openmm is built against.

jchodera commented 3 years ago

@jlmaccal: This is really important for us to sort out, and I'd love to try to come up with some simple documentation for building downstream packages with and without plugins.

@peastman and @jaimergp : Can we tackle some documentation for this?

@mikemhenry: Would be great if you could follow along, since we'll be using what they come up with?

jlmaccal commented 3 years ago

@jchodera: That would be extremely helpful. The documentation for conda-forge kind of sucks for anything complicated.

jaimergp commented 3 years ago

I'll do a short write-up but if the docs at conda-forge.org are insufficient, an issue should be raised at https://github.com/conda-forge/conda-forge.github.io/issues so it can be addressed at some point in the future. Maybe the content is there but in a different section.

Key ideas:

conda-smithy will take the info from all these files and retrieve only the keys needed from each. This results in a battery of YML files stored in .ci_support; one per job run in the pipelines. Each YML file stores the config needed to reproduce the build of each built package; in other words, that directory represents your build matrix.

Furthermore, they aren't built against the same sets of cuda versions. Is the build matrix just determined from openmm's at the time the downstream package is re-rendered?

Exactly, the build matrix is determined at re-render time. This happens per version bump, or when critical, via automated PRs submitted by the CF bots. I have to catch up with the OpenMM plugins packages, so that's why you are seeing different matrices now.

OpenMM is somehow an exception to the default CF configuration because we want to provide older CUDA versions too, so we have opted out of the default config and provide an extended conda_build_config ourselves. This is not needed in downstream packages like plugins, since you might want to the latest CUDA builds only.

Let me know if you need any further clarification!

jlmaccal commented 3 years ago

@jaimergp Thanks for the detailed reply.

I guess the part that was really confusing is how all of this works when a package is still staged, before being converted into a feedstock.

I started by copying and adapting the contents of recipe from openmm-feedstock. This contains a conda_build_config.yaml that specifies sets of cuda, cdt, and system images. However, this file causes problems at this pre-feedstock stage as the cuda versions and system images are hardcoded by the staging scripts and .yaml files. At this point, I got a cryptic error about an ambiguous build config. Through trial and error, I eventually reduced the cuda version to 10.2, which is what is hardcoded into the staging scripts. However, at this point, I ran into problems with the case where the cuda version was set to none. This confused me, because my cuda_build_config.yaml didn't contain an entry for none. Again, this is because that option is hardcoded into the staging scripts. To fix this, I explicitly needed to skip the build when cuda was not set or set to none.

In the end, none of this required major changes, but the differences between the staging environment and a feedstock don't seem to be documented well. When you combine this with the ~5-10 minute time it takes to get feedback after making a change, it makes for a frustrating day.

Exactly, the build matrix is determined at re-render time. This happens per version bump, or when critical, via automated PRs submitted by the CF bots. I have to catch up with the OpenMM plugins packages, so that's why you are seeing different matrices now.

I think I understand, but this statement is confusing me:

This is not needed in downstream packages like plugins, since you might want to the latest CUDA builds only.

The intention is that meld be built against the same set of cuda versions that openmm is. Similar to openmm, we would like to support users on HPC systems where they have no control over the cuda version.

What I would like is to have meld built with the same matrix as openmm. Can you clarify exactly what is needed for this? My current understanding is that I don't need a conda_build_config.yaml for this. I just need openmm as a dependency in meta.yaml and it will automatically identify all of the variants based on openmm's variants. Is that correct?

jaimergp commented 3 years ago

What we usually do is submit a minimal working version in staged-recipes (let's say, only one CUDA version) and then work the details out for the full matrix once the submission is approved and you obtain your feedstock. At staged-recipes you should not need any conda_build_config.yaml at all. Link to the PR there and I will provide feedback if you want!

jlmaccal commented 3 years ago

OK, I just removed conda_build_config.yaml. Feedback would be great: Adding meld: modeling employing limited data by jlmaccal · Pull Request #14409 · conda-forge/staged-recipes

jaimergp commented 3 years ago

Looks like it works! You can ping the review team and/or drop a message at https://gitter.im/conda-forge/conda-forge.github.io

isuruf commented 3 years ago

@jlmaccal, since you think conda-forge documentation sucks, please send a PR to improve the docs. If you don't, you are part of the problem.

jlmaccal commented 3 years ago

@isuruf: My apologies. My earlier comment was harsh and undeserved.

I would be happy to contribute to documentation, but at this point, I still don't really understand how things work well enough to do that. As you can see from my previous comment, my main confusion was the way that the staging environment differs from an actual feedstock.

I guess the part that was really confusing is how all of this works when a package is still staged, before being converted into a feedstock.

I started by copying and adapting the contents of recipe from openmm-feedstock. This contains a conda_build_config.yaml that specifies sets of cuda, cdt, and system images. However, this file causes problems at this pre-feedstock stage as the cuda versions and system images are hardcoded by the staging scripts and .yaml files. At this point, I got a cryptic error about an ambiguous build config. Through trial and error, I eventually reduced the cuda version to 10.2, which is what is hardcoded into the staging scripts. However, at this point, I ran into problems with the case where the cuda version was set to none. This confused me, because my cuda_build_config.yaml didn't contain an entry for none. Again, this is because that option is hardcoded into the staging scripts. To fix this, I explicitly needed to skip the build when cuda was not set or set to none.

In the end, none of this required major changes, but the differences between the staging environment and a feedstock don't seem to be documented well. When you combine this with the ~5-10 minute time it takes to get feedback after making a change, it makes for a frustrating day.

I didn't run across any documentation outlining these differences, which made things very confusing for me initially. I was trying to copy things from conda-forge/openmm-feedstock: A conda-smithy repository for openmm, which is a functioning feedstock. It was initially hard to understand why this was failing for me when it was working for openmm. The difference, which is now obvious in hindsight, is that one was an actual feedstock, while the other was staged. In some ways, this might be unique to the openmm ecosystem, as we're trying to support a broader range of cuda versions, which adds complexity.

I think some documentation that explains:

As I said, I would be happy to help in some way, but I don't think I know enough to write this.

jchodera commented 3 years ago

@jlmaccal, since you think conda-forge documentation sucks, please send a PR to improve the docs. If you don't, you are part of the problem.

@isuruf: I think it's important to stress that we are all striving to improve the onramp for bringing new developers into the conda-forge (and now OpenMM) ecosystem, which ultimately benefits us all by having more developers willing to contribute their time to improving the infrastructure and ecosystem.

The burden for improving documentation really needs to be shared: New developers can help identify specific areas where the current documentation is lacking, unclear, or confusing; at the same time, those of us who understand how things work can help improve the documentation in those areas so it becomes iteratively better. Statements like this, however, are a sure way to dissuade a new developer from being a contributor to either the ecosystem or the infrastructure. Similarly, we should help guide new developers to be specific in the ways that the documentation is lacking, as @jlmmaccal has above, but perhaps in a more tactful manner.

@jaimergp : Since building packages that include OpenMM plugins is operating at the bleeding edge of conda-forge, I was hoping we might be able to provide some additional OpenMM-specific documentation about developing and deploying packages with OpenMM plugins. We hope to foster a vibrant ecosystem of OpenMM-based tools here, so anything we can do to make the on-ramp easier will be incredibly helpful (and reduce the amount of help we need from conda-forge devs in the future).

Thanks for all your help, everybody!

isuruf commented 3 years ago

The burden for improving documentation really needs to be shared

Not really. Since we are all volunteers, they should not be burdened. If new developers want improved documentation, they should be more respectful of the time that the volunteers spent in improving the code and help out with the documentation instead of ranting.

The difference, which is now obvious in hindsight, is that one was an actual feedstock, while the other was staged.

This is the sort of thing you can help with documenting. (i.e. the fact that cuda matrices are fixed in staged-recipes vs feedstocks). If you open a PR to add this to documentation, others will help in polishing it and clarifying things.

jlmaccal commented 3 years ago

@isuruf With respect, I don't think I was particularly disrespectful. I made one off-the-cuff comment out of frustration that I apologized for. That comment wasn't directed at you, or anyone else, personally. And I don't think I've ranted at any point. I tried to provide feedback on what I found confusing and why I didn't think I would be able to properly write documentation. It's unfortunate if my comments came off as ranting, as that was not my intent.

I have a lot of respect for open source maintainers and appreciate what you do. I can only imagine the volume of garbage that you have to deal with. I think we're both reasonable people, but we got started on the wrong foot. May I suggest that we put that in the past and move forward?

In terms of documentation, I think the most sensible thing is for us to develop some internal documentation on what is involved in packaging a project that depends on openmm, as suggested by @jchodera. I can help with this, as the parts that were confusing are still fresh in my mind. (Some of them are still confusing!). Once we have a handle on that, we can "backport" to the conda-forge docs where it makes sense.

jlmaccal commented 3 years ago

@jchodera @jaimergp: With some feedback from @isuruf, I now have a conda-forge/meld-feedstock!

Initially, this built for cuda 10.2, which I understand is the current global pin from conda-forge-pinning-feedstock. Over time, I received a series of pull requests that added migrations for cuda 11.0, 11.1, and 11.2. I understand that one or more of these may end up as global pins and that older versions may be removed, as was the case for 9.2. Does this this comment still reflect the proposed versions of cuda that will be supported?

One option for downstream packages, like meld, openmm-torch, etc, is simply to use this global build matrix.

A second option would be to mirror the build matrix of openmm-feedstock.

Any thoughts on which approach is preferable? I'm currently leaning towards the first.

Incidentally, I did try the second option using a conda_build_config.yaml derived from openmm-feedstock, but I get the following error:

ValueError: variant config in /Users/jlmaccal/Source/meld-feedstock/recipe/conda_build_config.yaml is ambiguous because it
does not fully implement all zipped keys, or specifies a
subspace that is not fully implemented. To be clear:
.. we did not find ['9.2', '10.0', '10.1', '11.0', '11.1', '11.2'] from {'cuda_compiler_version': ['None', '9.2', '10.0', '10.1', '10.2', '11.0', '11.1', '11.2'], 'docker_image': ['quay.io/condaforge/linux-anvil-comp7', 'quay.io/condaforge/linux-anvil-cuda:9.2', 'quay.io/condaforge/linux-anvil-cuda:10.0', 'quay.io/condaforge/linux-anvil-cuda:10.1', 'quay.io/condaforge/linux-anvil-cuda:10.2', 'quay.io/condaforge/linux-anvil-cuda:11.0', 'quay.io/condaforge/linux-anvil-cuda:11.1', 'quay.io/condaforge/linux-anvil-cuda:11.2'], 'cdt_name': ['cos6', 'cos6', 'cos6', 'cos6', 'cos6', 'cos7', 'cos7', 'cos7']} in cuda_compiler_version:['None', '10.2']
isuruf commented 3 years ago

Sorry about my comment. I was more annoyed at @jchodera's comment that I was at yours. Yes, let's start over.

In terms of documentation, I think the most sensible thing is for us to develop some internal documentation on what is involved in packaging a project that depends on openmm

This is not the correct way. You should develop it for conda-forge first as then you'll have more eyes on it and less chance that it'll be wrong.

no builds for old cuda versions that might still useful on older hpc installations

You should care about old cuda versions only if there are any user request to do this and you think it's worth your time. Doing it just because it might be useful to someone is a waste of your time.

Incidentally, I did try the second option using a conda_build_config.yaml derived from openmm-feedstock, but I get the following error:

WIP docs are at https://github.com/conda-forge/conda-forge.github.io/pull/1262

jchodera commented 3 years ago

I'd like to stress that I'm 100% supportive of everyone from the OpenMM community helping contribute to improve conda-forge documentation in general!

This issue initially appeared to be a special case of "developer needs to build a package with an OpenMM plugin with CUDA support", and it wasn't clear whether there were general contributions to the conda-forge documentation that could help there, but if there is anything we can do to help improve overall clarity at the conda-forge level, we should absolutely try to make those contributions.