CaptainSifff / paper_teaching-learning-RSE

The teachingRSE project: "Teaching and Learning Research Software Engineering"
Creative Commons Attribution 4.0 International
23 stars 20 forks source link

Ask people of their opinion, what a beginning RSE should learn at the start of their career. #229

Open CaptainSifff opened 3 months ago

CaptainSifff commented 3 months ago
5: What are our teachers made of, and how can we find them
6: Work on the paper.
jngrad commented 3 months ago

We currently have a list of competencies that have been assigned to different RSE career stages (section 5.2) and team structures (Tables 4, 5, 6). Here are a few ideas of what we could do:

jngrad commented 3 months ago

The time for the breakout session would be about 10 minutes, and the workshop topic is "Institutionalized Organisation of RSE Education". I would be more inclined towards designing a curriculum, using our survey of teaching resources, to create a roadmap for early-career RSEs whose home institution doesn't offer training in RSE. Designing a university curriculum would be a separate topic, for which there is already a dedicated breakout session (see #228).

We could use existing curricula from M.Sc. programs as a starting point, and decide which skills would be worth learning at different stages of the academic progression, e.g. at the B.Sc. level (version control, scripting, continuous integration), at the M.Sc. level (archiving, licensing, FAIR, reproducibility), and at the Ph.D. level (containerization, packaging, funding programmes). Here is an example of RSE course: University of Stuttgart – Simulation Software Engineering.

CaptainSifff commented 2 months ago

do you consider the question also the other way around? maybe like what should a mid-20 person know, if they become part of your group?

mhagdorn commented 2 months ago

During call on March 1st, we were wondering whether this part should be about specific items that early RSEs should learn to avoid repeating the general discussion that we covered in the foundations

jngrad commented 2 months ago

The results are in this pad. @braunms and I will fix grammar/abbreviations, then summarize the info in a draft PR.

jngrad commented 1 month ago

What a beginning RSE should learn at the start of their career

Several participants pointed out that RSEs should have training in mathematics, numerical methods and statistics, because there are classes of equations (e.g. differential equations) and methods (e.g. regression analysis) that are commonplace in otherwise unrelated scientific disciplines, and the numerical/statistical/simulation methods used to investigate them are identical. RSEs have a capacity to abstract away concrete problems and translate domain-specific problems to software. Training should cover good coding practices, such as version control, testing, documentation, modern software engineering, linters, formatters, licensing and library re-use, i.e. topics that are well-established and are unlikely to change in the future.

Yet RSEs may also need to know about the current state of affairs, i.e. requirements from funding agencies, such as FAIR data and FAIR software (DFG, 2023). It is also unclear how to teach certain topics like linters, formatters and testing frameworks without also showing specific tools (ruff, pylint, black, unittest, pytest, etc.) which may no longer be available or still be relevant in a few year's time, or may be too tied to a specific programming language whose market share may vary wildly across different domain-specific fields (e.g. Python vs. Julia vs. R, or C++ vs. Java vs. Rust). Making training language-agnostic can be challenging. The HPC ecosystem sometimes designs courses where algorithms are presented in a language-agnostic way in a few slides, followed by a single slide that shows the corresponding function call in Fortran, C, C++ and Python (see for example slides 163 to 165 in Rabenseifner, Introduction to the Message Passing Interface, 2023).

RSEs should have a basic introduction to computer science, e.g. know about computational complexity and data structures (linked lists vs. arrays vs. hash maps). They should also be good communicators with domain scientists. One option would be for the course to include practical experience by sending students a few hours a week in a domain-specific laboratory of the university where open source software is being developed as part of a domain-specific research project (in Germany: Ausbildung, Forschungspraktika); this would also help students build their GitHub or GitLab profile with contributions to real-world software.

braunms commented 1 month ago

Many thanks for this nice summary!

An overarching challenge that had emerged was to define the right balance between general and area-specific skills and knowledge.

jngrad commented 1 month ago

I attended "What are the future skills needed for HPC? How to embrace the latest technological innovations?" from EuroHPC Summit 2024 this morning via their livestream. They are trying to answer similar questions in terms of curriculum design and how to integrate their specialty in academic programmes.

One panelist talked about training HPC practitioners more effectively by dividing the B.Sc. and M.Sc. curriculum into three stages: 1 year of domain-specific training in a STEM field, 2 years of computer science, and 2 years of HPC specialisation. Another panelist mentioned that RSE education provides an excellent foundation for a HPC specialisation. Another panelist opinioned that digital competencies should be taught early in the curriculum (B.Sc. is too late). Koen De Bosschere made a distinction between education ("a university educates people for a whole career, not for a first job") and upskilling (at the job site and through seminars/workshops as part of continuous learning).

Julian Kunkel (HPC Certification Forum) explained that creating clearly defined digital skills and organising them was challenging, and that "EU projects for training and mapping out competences are valuable". Thor Wikfeldt (ENCCS) talked about the need for a training infrastructure with a dynamic and flexible skill tree, based on a curriculum featuring a common set of transferable skills and a branching mechanism to teach domain-specific skills to practitioners of that domain (suitable for "upskilling, reskilling, newskilling").

Cristina Silvano presented EUMaster4HPC, the pan-European Master for HPC. One panelist mentioned the EuroHPC virtual training academy call, which aims to develop a "competence and qualification framework based on a modular skills tree of competences and learning objectives, addressing the gap between basic digital skills and domain specific specialist knowledge", as well as the necessary training material and IT infrastructure to serve the online lectures.

Eugenia Kypriotis presented the LEADS Advanced Digital Skills project (deliverables: D1.2 First Draft of Advanced Digital Skills Demand and Forecast Report, D2.2 Leads Gap Analysis, D3.1. First Guidelines Generated) and mentioned the May 16 Advanced Digital Skills Summit in Madrid. María S. Pérez-Hernández talked about an initiative to discover, label and categorize non-formal training in data science (still in an early phase), and their first call for microcredentials of formal training in big data science.

CaptainSifff commented 1 month ago

https://compeau.cbd.cmu.edu/establishing_a

jngrad commented 1 month ago

Summary from today's Zoom call. We discussed how numerical methods were already taught early at the Bachelor level. If we want to teach the RSE as a Master's programme, we could make math modules from the Bachelor level hard requirements to enroll in the RSE course. The domain-specific knowledge is already available if the Bachelor was in STEM.

Regarding the teaching format, we may be reaching the limits of what can be achieved in the traditional slideshow format. HPC slideshows with multiple programming languages per slides can become quite confusing for the more advanced algorithms (example: slide 455). There is an example of language-agnostic non-linear course in bioinformatics (Compeau 2019, doi:10.1371/journal.pcbi.1006764) where only the algorithms are presented and students have to develop implementations in their preferred programming language. The exercises provide input data and the result of the calculation must be submitted within a few minutes, after which the input data changes. This encourages students to develop efficient implementations of their software (I used Python when taking this course online and had to explore multithreading and Cython to solve some of the exercises under the time limit). The corresponding blog post has a figure which shows how the latest iteration of the course has a core set of mandatory topics (central horizontal line of squares connected with black arrows) and optional topics (loops of squares). This way students may skip the optional parts they are already familiar with and spend more time in the optional parts where they learn something new. This also prevents students getting stuck if they do not have a strong background in e.g. advanced CS topics.