Distribution packages - Githubissues

MarDiehl commented 4 years ago

To gain momentum, I would be important that the library is available to many users without much effort. HPC maintainers can of course compile the library on their own, but for most of the users this would increase the entry barrier. I guess most Python and C users have never compiled the respective standard library. Therefore, we should try to release distribution packages (Debian/Ubuntu, Fedora, Conda, MacOS) as early as possible.

I would volunteer to do that for Arch Linux, but I think Ubuntu and MacOS would be more relevant.

In this context, I also believe in the "release early, release often" philosophy. The GIMP developers also state:

2019 was the second year in a row where we shipped updates with new features in the stable branch. Our assumption was that this could change the public’s perception of the ongoing development efforts and shift the balance towards having more contributors. Here is why.

Between 2012 and 2018 (v2.8 and v2.10 releases respectively), we worked hard and added a ton of improvements and new features, we demoed them on social networks, mentioned them in annual reports etc., and yet we kept hearing how GIMP was dead because those changes were not in any stable releases. The same thing was happening before in the four years between v2.6 and v2.8.

Moreover, this was preventing people from contributing as they would have to wait a long time to see their contribution actually used. That wasn’t sparking an interest really.

Hence, after the v2.10 release, we kept adding new features at the same pace and started producing regular updates with those features, and all of a sudden we started hearing how we “picked up the pace”!

So this could be a lesson for other projects: arguing against the irrational is futile. Just don’t keep people waiting. If you did something good, share it!

From https://www.gimp.org/news/2020/01/04/gimp-and-gegl-in-2019/

Therefore, having in the initial rapid growth period a new release very often (let's say 4 times a year) would be a good marketing instrument. If people get the updates from their respective package managers, they will also serve as beta testers. We would need of course to clearly state that the API might change.

certik commented 4 years ago

@MarDiehl thanks for the feedback.

In the long run, we are hoping our Fortran Package Manager (fpm) will be successful. So the distribution will become very simple on all platforms.

In addition to fpm however, we still want to make regular releases and generate a source tarball that contains the pre-generated files (the git repository requires the fypp pre-processor, but the tarball would not), automatically built by our CI.

And it would be this tarball that would go into distributions.

If you want to help us with this process, that would be really awesome! Thank you.

MarDiehl commented 4 years ago

@certik Sure, I have experience with generating packages for Arch linux, Ubuntu, and Fedora.

One further advantage of having distribution packages is that we benefit from their CI tools. Conda forge for example automatically creates packages for different operating systems.

I have another question regarding the distribution: How will the standard lib be used and versioned? I assume we would have something like

use stdlib

and link with

-lstd

at least on Unix-like operation systems. How would support for different versions work in that case? According to semver, API changes are fine between major versions. That could mean that something like

use stdlib1

would be needed. At least I'm not aware of any other OS-indendent mechanism to select the correct Fortran mod file

certik commented 4 years ago

Regarding versioning, I suggest to use fpm in the long run, which will allow to use different versions.

Regarding the API, currently we have separate modules such as stdlib_io, or stdlib_linalg, and because we always put new functionality into the "experimental" namespace first, the modules are called stdlib_experimental_io and stdlib_experimental_linalg.

Once we move it from "experimental", it will just become stdlib_io, or stdlib_linalg. Finally, we will have to see if it makes sense to have some flat namespace such as stdlib. The general issue with that, and experience from Python packages is that once you put stuff in stdlib, you cannot take it away as it would break people's codes. So for now we chose not to introduce such a flat namespace, and that always gives us the option to do so in the future, as we gain more experience and usage of stdlib and have a better idea of what would make sense to put there, if anything.

MarDiehl commented 4 years ago

@certik thanks for the clarification. I would still raise awareness for the issue of different versions. Even if at one point in future fpm is there, linux distributions will use their own package managers. Giving them the possibility to install stdlib in different versions would be really beneficial. Breaking changes to interfaces will certainly come and if that breaks existing code, people won't use stdlib again.

If different versions can be installed in parallel, developers can migrate to a new version if they like to. Unmaintained code will still work as before, just at the cost that an additional version of stdlib needs to be installed. This would also ease us from spending too much time and efforts on keeping backward compatibility.

certik commented 4 years ago

@MarDiehl yes, for distributions we definitely want to allow installing any way they like, not force fpm on them.

Regarding the versions, what are the possible paths forward?

Do nothing in the API itself and just rely on the package version. Breaking change would require to install an older version.
Version the modules, so stdlib_io_v1 would be right now, and in the future if an incompatible API change has to happen, do stdlib_io_v2.

Are there other ways to handle this?

Note that our goal is not to change the API once it is settled. That is why we have a rigorous process how the API gets adopted, and first it goes into experimental so that people can start using it and we can ensure that the API will work. We haven't reached that point yet, but eventually we'll be ready to move some functionality from experimental to main. At that point, our expectation is that we will be supporting the API essentially forever. Just like standard Fortran does not change APIs in the backwards incompatible manner. If there ever comes a need for a new API, we would have to create a new module, or a function with a new name. So for now, I think we all mostly assumed we would use the option 1.

But we are open to discuss this, maybe the option 2 or some other alternative is better.

MarDiehl commented 4 years ago

@certik I think a stable API is the ultimate goal one should aim for, but I doubt that it can be reached under all circumstances. At least I often figure out the best way of doing something much later. In such a situation, it would be foolish to use the initial, non-optimal solution instead of the new, better solution.

Also, I think breaking changes are not a bad thing in general. What is bad are unpredictable changes. Semver is exactly designed to handle this situation: An increase in the major version indicates breaking changes. This is transparent to the user. There are many applications that use python 2 and they happily coexist with python 3

Therefore, I would prefer to have a clear plan for handling the case that backward compatibility can not be maintained. It will come earlier than one hopes. But to be honest, I currently don't have a good idea how plan B should look like. Making the version part of the module name is certainly a possibility. The so-name concept with symbolic links on linux is similar in spirit. Using names like stdlib_io1 (which might be a symlink pointing to stdlib_io1.1) would be explicit and clear. If the version your application needs does not exists, it will not compile.

As a final, personal comment I would like to add that I have the impression that the 'holy grail' of backward compatibility is a reason for the decline of Fortran. Of course there exists a lot of valuable old code, but always consider the interests of the slow ones is annoying for the innovative players. If you don't bother to update your application that is fine, but that should not stop others from doing so.

fortran-lang / stdlib

Distribution packages #203