DE-RSE / un-deRSE23-breakouts

8 stars 3 forks source link

Workshop on reproducible documents using julia and pythontex #3

Open BeastyBlacksmith opened 1 year ago

BeastyBlacksmith commented 1 year ago

I'd like to offer this workshop at the conference.

This would take 3h - 4.5h. I am also looking for people who'd like to join as a helper.

HeidiSeibold commented 1 year ago

Hi @BeastyBlacksmith, sounds interesting. Would it be possible to make it less focused on physics?

BeastyBlacksmith commented 1 year ago

It definitely is! Its just that the example document is from that field, but everything else holds for other documents as well.

Are we talking about the title or using a different document as a reference?

HeidiSeibold commented 1 year ago

I was just looking at the workshop reference which is focused on physics. But sounds great, thanks!

pancetta commented 1 year ago

Hi @BeastyBlacksmith, very interesting proposal, thanks! As we're (slowly) moving toward more specific submissions, could you please add more details to the proposal? Since you already have a good structure in place, it seems, could you name 1-3 people responsible for the course? We'd also need a short abstract/appetizer later on.

BeastyBlacksmith commented 1 year ago

Creating a reproducible report using pythontex - A Julia example

Simon Christ (i'll add more people here once I found them)

Writing reports (or any document relying on volatile data) with hardcoded values is a pain. Typically, during the writing process the data changes several times and keeping the document in sync is tedious and error prone.

pythontex allows to mix computational elements (in a variety of programming languages) and $\LaTeX$-text in the same document. This allows for data being included, computed and visualized as the document is created leading to an always up-to-date document.

In this workshop we are going through the process of creating such a report using the julia programming language (no prior knowledge needed).


Does that work?

pancetta commented 1 year ago

Very nice, thanks! So, how much julia is needed to understand/follow this session? Can this be transferred to other languages?

BeastyBlacksmith commented 1 year ago

No prior knowledge of julia is assumed and you could do the same with any language that pythontex supports (cf. chapter 7 of https://github.com/gpoore/pythontex/blob/master/pythontex/pythontex.pdf).

pancetta commented 1 year ago

Great. You should add that to the description and maybe consider changing the title. This is more on pythontex than on Julia, the latter being "just" the example language, right?

BeastyBlacksmith commented 1 year ago

This is more on pythontex than on Julia, the latter being "just" the example language, right?

There are some niceties that I will make use of, but otherwise yes.

Would "Creating a reproducible report using pythontex in combination with julia" raise more realistic expectations?

pancetta commented 1 year ago

Maybe "Creating a reproducible report using pythontex - A Julia example"?

pancetta commented 1 year ago

Hi again! How long do you think the BOS should be? We plan with slots of 90 minutes with the default length of one slot.

BeastyBlacksmith commented 1 year ago

What is a BOS? I'd plan 3h for this.

pancetta commented 1 year ago

Sorry, BOS = break-out session! How flexible are you? What if we have 4.5h, what if we have only 1.5h?

BeastyBlacksmith commented 1 year ago

4.5h would be fine.

Doing it in 1.5h would require that all participants come well prepared (have everything preinstalled etc.).

mmesiti commented 1 year ago

I have a question/comment regarding this.

I am interested in something which is quite different but at the same time it could be based on similar technologies, namely: tooling and techniques to make sure that the examples we add in-line to documentation keep working in the same way. Concrete examples of issues that I faced and I would like to tackle:

  1. notebook-based tutorials that, when re-run with later software versions, give different output, sometimes disrupting the planned narrative of the tutorial;
  2. in-line code examples in the documentation for a HPC system, that after a system upgrade do not work any more;

These kind of issues can greatly reduce the usefulness of the documentation because they destroy reproducibility. So, my question are:

  1. Does this workshop also cover techniques that can be used to test documents for regression or mismatches between documentation and the system/code they try to describe?
  2. If not, would it be interesting to add a 15-20 minute sub-session specifically on this? I think about a discussion of what are the biggest difficulties in testing your documents and how to work with other tools. I could prepare some thought-provoking questions and suggestions and collect the outcome in a document.

Alternatively, not to hijack this session (but I am not sure there will be enough interest, or if it is too late now), I could propose a mini hackathon/discussion session where we try in groups to create a test harness for some pieces of documentation that we would like to test, and discuss our progress and challenges encountered at the end. I could try and produce a small library of python functions that could be used for that - I could also try with julia, although I am far from an expert with it (I could even try something in bash for maximum portability but also maximum pain).

BeastyBlacksmith commented 1 year ago

I can only comment on the julia side of things, but I would tackle these issues with the following:

  1. Lock package versions for notebooks. Julia environments contain a Manifest.toml for that that you can instantiate at the beginning of the notebook to have a reproducible environment. Environments will be touched during the workshop and I could expand on that.
  2. Documenter.jl has a doctests functionality that will test if marked documentation snippets (incl. in-line docstrings) behave as expected. This can also be integrated in CI pipelines without actually building the documentation. I wouldn't talk about that in the workshop, but depending on how much time we get, I can also imagine an "ask-me-anything" part after the main course.
mmesiti commented 1 year ago

Locking package versions is exactly what I would like not to do: I would like instead to check what breaks or becomes incorrect with newer version of the packages, or when a system is upgraded and the configuration changes (I can't version control the hardware of a whole HPC cluster...). In this regard I see that my proposal is quite orthogonal to the plan of this BOS.

Regarding doctests, it was actually the inspiration for this, with its incarnations in various languages.

BeastyBlacksmith commented 1 year ago

Alternatively, not to hijack this session (but I am not sure there will be enough interest, or if it is too late now), I could propose a mini hackathon/discussion session where we try in groups to create a test harness for some pieces of documentation that we would like to test, and discuss our progress and challenges encountered at the end.

Today is actually the last day to open a new issue for that in time, so you are just in time. The more, the merrier :)

HeidiSeibold commented 1 year ago

I added a label to this break-out. Can you check if you feel it is appropriate and change it if not? Let me know if you have any questions.

HeidiSeibold commented 1 year ago

Could you please answer the following questions for me, please? Sorry if you already have this somewhere

Who could be people interested in collaborating on this?

(feel free to tag them with their GitHub username if they have one)

I assume Simon Christ, @mmesiti and @BeastyBlacksmith? Anyone else?

How much time do you need for this?

(90 minutes or multiples thereof)

2*90=180 minutes

Abstract

(Can be short)

BeastyBlacksmith commented 1 year ago

Who could be people interested in collaborating on this?

@BeastyBlacksmith (Simon Christ), @felixcremer, @jpthiele

How much time do you need for this?

360 minutes

Abstract

Creating a reproducible report using pythontex - A Julia example

Writing reports (or any document relying on volatile data) with hardcoded values is a pain. Typically, during the writing process the data changes several times and keeping the document in sync is tedious and error prone.

pythontex allows to mix computational elements (in a variety of programming languages) and $\LaTeX$-text in the same document. This allows for data being included, computed and visualized as the document is created leading to an always up-to-date document.

In this workshop we are going through the process of creating such a report using the julia programming language (no prior knowledge needed).

pancetta commented 1 year ago

Hi! I just added the "Accepted" label to this BOS. Welcome on board! https://un-derse23.sciencesconf.org/program

pancetta commented 10 months ago

Hi all, the unconference is only 3 weeks away now! On day 1 there will be a breakout blitz where all session organizers should advertise their sessions. 1 minute, 1 slide, let people know what you intend to do. Please prepare this slide in advance and add it right here (PDF please), by September 20.

BeastyBlacksmith commented 10 months ago

Here is my slide: UnRSE2023-Christ.pdf

pancetta commented 9 months ago

Here is the main hub for taking notes: https://pad.gwdg.de/FkFJTslFQhq-UF3Es6q4rw#

pancetta commented 9 months ago

Have fun with the session(s)! Please add the pad you're using also here for people to see what you did.

If possible, please prepare a 1 minute wrap up of your session for the farewell session on Thursday afternoon! What did you do in the session, how would you like to continue, how can people contribute after the unconference etc. We'll go through the blitz slides again one by one as in the blitz session.

BeastyBlacksmith commented 9 months ago

The pad for this session is the following: https://pad.gwdg.de/z-73H69rQWC5pNVxglUiTw