NNPDF / pineappl

PineAPPL is not an extension of APPLgrid
https://nnpdf.github.io/pineappl/
GNU General Public License v3.0
12 stars 3 forks source link

Migrate tasks in `make-release.sh` to CI actions #216

Closed cschwan closed 1 year ago

cschwan commented 1 year ago

The script make-release.sh has become quite complicated and we should simplify it by breaking it up into smaller tasks, some of which can be CI actions:

alecandido commented 1 year ago

Instead of making a complex Bash script (or many Bash scripts to run together), we can use Python: it's still a scripting language, but has the benefit of scaling better (we can always break a script in functions, then modules, and eventually a tiny package, while bash stops at functions).

Moreover, we have more expertise in Python than Bash, and definitely more than Perl: https://github.com/NNPDF/pineappl/blob/2c5fee4a7cce4b509e121766fb4ba53d6d2b6731/make-release.sh#L89 (just to prove that Bash alone is not sufficient, and for semver there are existing packages https://github.com/python-semver/python-semver to simplify the code).

I'd also suggest collecting in a single folder the utility scripts we're using. So, not only make-release.sh, but also generate-coverage.sh, update-wheels.sh, and test-ploughshare-conversion.sh (the last one can also fit in examples/).

cschwan commented 1 year ago

I don't particularly like this idea because if we do that we suddenly end up with with a huge dependency chain that breaks on some computers because pip is too old and takes very long to run. I'm also not convinced the Python scripts will be shorter.

My point is: Python isn't a panacea. It has its uses when BASH scripts become more complicated, but I don't we've reached that point yet.

alecandido commented 1 year ago

So, you can decide that you don't want to add semver nor other package, just to depend on Python and its standard library. This is a fair point, and you'd avoid dependency management. I perfectly agree with this.

And I also agree that Python is not a panacea. It is just Bash that is bad. If you want to use some other scripting language, like JS (with Node), Ruby, or something else there are pros and cons for each of them, but eventually the best choice is just the one you're more familiar with (while I'd not suggest something like R, since it has a completely different scope).

Bash is optimal to work with files, for as long as your scripts are 5 lines long, and not much more. After a certain threshold, you see all its limits, since it is designed to be good as a shell (i.e. interactive mode), rather than a programming language. Most likely the difference is that I put the threshold much lower than you, since I really prefer short pure functions in general (even though I fail myself in being short often, purity is just a matter of trade-offs).

One example: it definitely encourages old-style imperative programming, with near-zero modularity and string manipulation everywhere, instead of function calls (not by chance Python and JS are transitioning to types, since they are useful to maintain scripts as well).

Moreover, Bash is a further "language" people should know to contribute. It is very well-known at basic level, i.e. using commands and pipes, but when you come to if conditions, its double-brackets and so on, it is much more obscure to the most, and not so standard (you're naming scripts <script>.sh, but they would not run with sh <script>.sh, so .bash would be a better extension, or drop the extension at all such that people knows they should run as executables, and thus is up to the shebang).

cschwan commented 1 year ago

So, you can decide that you don't want to add semver nor other package, just to depend on Python and its standard library. This is a fair point, and you'd avoid dependency management. I perfectly agree with this.

And I also agree that Python is not a panacea. It is just Bash that is bad. If you want to use some other scripting language, like JS (with Node), Ruby, or something else there are pros and cons for each of them, but eventually the best choice is just the one you're more familiar with (while I'd not suggest something like R, since it has a completely different scope).

I disagree with the premise that Bash is bad, I think it's often a very good choice because the scripts a very succinct. But when it comes to the choice of scripting languages I admit that I'm partial to Bash. There have been many times where I solved a problem with it extremely quickly and efficiently for which other languages would probably have been a worse choice. I say that because I think Bash isn't really much of a language on its own but rather a way to combine many languages together: awk, grep, sed, ... anything you know really.

Bash is optimal to work with files, for as long as your scripts are 5 lines long, and not much more. After a certain threshold, you see all its limits, since it is designed to be good as a shell (i.e. interactive mode), rather than a programming language. Most likely the difference is that I put the threshold much lower than you, since I really prefer short pure functions in general (even though I fail myself in being short often, purity is just a matter of trade-offs).

I agree that I definitely have a much larger threshold :smile:, which I admit isn't entirely good.

One example: it definitely encourages old-style imperative programming, with near-zero modularity and string manipulation everywhere, instead of function calls (not by chance Python and JS are transitioning to types, since they are useful to maintain scripts as well).

I really don't care about that when writing Bash scripts, they're supposed to solve a problem, so anything that works is fine. However, you could argue argue that using pipes is very similar to functional programming, which is considered 'modern' these days.

Moreover, Bash is a further "language" people should know to contribute. It is very well-known at basic level, i.e. using commands and pipes, but when you come to if conditions, its double-brackets and so on, it is much more obscure to the most, and not so standard (you're naming scripts <script>.sh, but they would not run with sh <script>.sh, so .bash would be a better extension, or drop the extension at all such that people knows they should run as executables, and thus is up to the shebang).

Bash is one further language, that's true, but so is any other language that would replace it. We're using many programming languages in PineAPPL: Bash, C, C++, Fortran, Python and Rust, and they're all unavoidable, including Bash which is used inside the CI. For instance the ./generate-coverage.sh script, which I used as a playground to generate doctest coverage. If you have look into https://github.com/NNPDF/pineappl/blob/master/.github/workflows/rust.yml you'll see that towards the end it's almost identical to ./generate-coverage.sh (which doesn't generate .lcov files but rather a much more detailed HTML report). I doubt that using any other language will significantly improve this script. The same holds true, in my opinion, for all other scripts (except maybe make-release.sh), which all perform file-heavy tasks.

I'd argue that one should treat Bash like any other language, and https://tldp.org/LDP/abs/html/ is a good place to learn it properly. But many things can be learned on-the-fly, like the difference between single- and double-brackets. However, often I find this difference to be completely irrelevant.

As for the extension, stackoverflow seems to agree that .sh is a common extension, but people argue that none would be better.

But that all being said: look at my original comment, where I argue basically the same point as you are arguing: make-release.sh should be improved.

alecandido commented 1 year ago

I disagree with the premise that Bash is bad, I think it's often a very good choice because the scripts a very succinct. But when it comes to the choice of scripting languages I admit that I'm partial to Bash. There have been many times where I solved a problem with it extremely quickly and efficiently for which other languages would probably have been a worse choice. I say that because I think Bash isn't really much of a language on its own but rather a way to combine many languages together: awk, grep, sed, ... anything you know really.

So, the statement "Bash is bad" without any context is for sure false, as it would be for any other language or tool (I'm sure even Cobol might have its own perfect applications...). The problem is that Bash is not a language, exactly as you're saying, so it is good if you can limit to a few command invocations, but it is not good if you have to manipulate data, because at that point you have much poorer support than most other languages, even for math and other basic operations. It is simply not made for that purpose.

I really don't care about that when writing Bash scripts, they're supposed to solve a problem, so anything that works is fine. However, you could argue argue that using pipes is very similar to functional programming, which is considered 'modern' these days.

I care about every single line of code that is going to survive, since it will have to be understood and maintained. Especially workflows and utility scripts are particularly delicate, since they are prone to be updated whenever something external to them changes (so the usual practice "it works, don't touch it" does not apply very often).

E.g. the following: https://github.com/NNPDF/pineappl/blob/ef978722e052224522860439a5a5de55d380f932/.github/workflows/rust.yml#L53-L69 it is clearly much better as a for loop, and this one: https://github.com/NNPDF/pineappl/blob/ef978722e052224522860439a5a5de55d380f932/.github/workflows/rust.yml#L84-L88 would be much clearer splitting the list of folders on their own, and giving them a telling name.

Pipes are not necessarily functional-like, but they usually are, and there is a good reason why people now care about functional: especially in UI there were a lot of people happily manipulating things in a single global scope (that is exactly what JS was about), but then it was a mess for debugging, since the state was incredibly hard to reproduce for complex applications. It doesn't have to be always functional, and strict purism has always its drawbacks (in this sense I'm a great fan of https://peps.python.org/pep-0020/, in this case the "Although practicality beats purity." - still Python, sorry), but for sure functional has some advantages (especially in long-living programs, not trivial when manipulating large amount of data).

Bash is one further language, that's true, but so is any other language that would replace it.

True that PineAPPL is mainly Rust, and I'd not script in Rust... My point was more about people collaborating with this project (i.e. NNPDF), who decided to be Python-centric. Moreover, this is not an isolated choice, since machine-learning is definitely mainly supported in Python, and the scientific community is migrating there (even if C/C++ and Fortran are definitely still relevant, and Julia is also becoming more common).

including Bash which is used inside the CI.

This can be limited arbitrarily, invoking external scripts (that is usually what you do in Bash, and you do want a simple CI workflow, not the ideal place to define complex tasks).

which all perform file-heavy tasks.

Bash is good at manipulating files, but it is not the only good one. Python has been used for system scripting since long, but definitely it also keeps improving.

alecandido commented 1 year ago

However, getting more to the point:

Python pros:

All the sed and similar can be done in Python as well, just with standard library.

Bash pros:

I'd add one more candidate:

JS (Node) pros:

The JS proposal is only interesting if you are positive about automating the task and delegate to GitHub (and we solve the problem about the not uniform environment and very old computers). If you prefer to do it manually, better to avoid (but I doubt, since the script is even automatically pushing and releasing).

cschwan commented 1 year ago

The publish commands have been move into the newly-added workflow crates.yml in commit fa0fc65f1b1ffbf0f80059a00f4ece941156a023. This still needs to be tested.

cschwan commented 1 year ago

I've fixed some bugs in commits 00b247fa1bed66200fceb82bf121c50618a1f9fb and 51f84ce3726c348417d9ac58c9ceac220291ed9e, but this still isn't enough, because the container needs to have APPLgrid and fastNLO installed.

cschwan commented 1 year ago

The container has been changed in commit 8cf40dff6414c1541ea761c6b688b9cf5c208195.