HSF / PyHEP.dev-workshops

PyHEP Developer workshops
https://indico.cern.ch/e/PyHEP2023.dev
BSD 3-Clause "New" or "Revised" License
9 stars 1 forks source link

Teaching, training, documentation, and coordinating documentation #7

Closed jpivarski closed 10 months ago

jpivarski commented 1 year ago

Helping users find and understand the software they need, keeping documentation up to date, and documenting procedures that cut across multiple packages.

jpivarski commented 1 year ago

From @alexander-held in https://github.com/jpivarski/PyHEP.dev/issues/9#issuecomment-1612636447:

how do users get help for things they cannot find in documentation?

That's an important point that I didn't mention anywhere, in any group descriptions. It sounds to me like it should go in #7 (here). Having places to ask questions accounts for the fact that teaching, training, and documentation won't be all-inclusive; there will always be something that isn't clear. But then, knowing where those places are is also something to be learned from the training or documentation.

I'd say that discussions about real-time chat (Gitter, Discord, etc.) and help message boards (Discourse, GitHub Discussions, StackOverflow, etc.) should be included in discussions about teaching/training/documentation, and therefore be in this group.

amangoel185 commented 1 year ago

I'd be interested in this group!

Might have an indirect overlap with #6 too (as is naturally the case with a lot of the topics) - how open source work is delegated, how much time can be officially dedicated to training, documentation etc.

btovar commented 1 year ago

+1

JMolinaHN commented 1 year ago

Very interesting topic. I am interested in participating.

mattbellis commented 1 year ago

+1

ph-ilten commented 1 year ago

+1 Interested in coordinating tutorials and some best practice people are using for tutorials.

ianna commented 1 year ago

+1

klieret commented 1 year ago

+1 interested.

Another topic that wasn't directly mentioned yet: How do we deal with Binder having reduced resources? Can Codespaces replace Binder (at HSF Training we tested it successfully recently with ROOT and Scikit-HEP)? What other options do we have?

mattbellis commented 1 year ago

@klieret I've never tried it with ROOT, but I've had good luck with Google Colab and installing uproot and awkward. Here's a sample that we've used to show people how to access some CMS open data files, converted to nanoAOD+, that we are hosting on Google Cloud Storage, as a test case.

https://colab.research.google.com/drive/16XiPu8W_1RQox-B6VeEcDuCmBfDngVqH?usp=sharing

It doesn't solve all issues, and like many things Google-related, who knows how long it will be available. :) But from an educator's viewpoint, it's a great tool.

mdsokoloff commented 1 year ago

@klieret I've never tried it with ROOT, but I've had good luck with Google Colab and installing uproot and awkward. Here's a sample that we've used to show people how to access some CMS open data files, converted to nanoAOD+, that we are hosting on Google Cloud Storage, as a test case.

https://colab.research.google.com/drive/16XiPu8W_1RQox-B6VeEcDuCmBfDngVqH?usp=sharing

It doesn't solve all issues, and like many things Google-related, who knows how long it will be available. :) But from an educator's viewpoint, it's a great tool.

I have been working with high school interns doing LHCb analysis. We started using Jupyter notebooks on local servers. They prefer to work in Google Colab (and are doing so). They have been using uproot, awkward, and iMinuit with .root files on servers. I still prefer Jupyter notebooks on a private server (or personal computer), but Colab lets them work together more easily.

clelange commented 1 year ago

I'd be interesting in discussing deprecating documentation and ensuring the latest greatest is the actual entrypoint. A lot of people find outdated documentation and examples and waste their time trying to understand why it doesn't work for them...

agoose77 commented 1 year ago

@nsmith- here's a few links pertaining to MyST Markdown:

MyST is a spec for a Markdown flavour, and also a brand for open-source tools (https://mystmd.org/guide/quickstart-jupyter-lab-myst)

Jupyter Book builds web "books" using Sphinx, and can read MyST, execute and render notebooks, and integrate with existing Sphinx projects (often by dropping the top-level Jupyter Book CLI and using the Sphinx components). You can add tags to cells to make them drop-downs, just as you can add tags to MyST admonitions to do the same for non-cells.

henryiii commented 1 year ago

+1

See https://learn.scientific-python.org/development/guides/docs/

mattbellis commented 1 year ago

I'd be interesting in discussing deprecating documentation and ensuring the latest greatest is the actual entrypoint. A lot of people find outdated documentation and examples and waste their time trying to understand why it doesn't work for them...

+1000 to this @clelange

klieret commented 1 year ago

Live notes from the discussion from Tue

Tuesday session: User experience for physics data analysis tools/Documentation/Training

Present: Matthew Bellis, Angus, Oksana, Mason, Aman, Zoe, Kilian, Ben, Remco, Clemens, Josue, Benjamin, Juraj

gh issue

Questions & discussion

  • Documentation
    • How to connect the different resources? Scikit-HEP is decentralized development, but central vision.
  • Relation between different kinds of matrial: Diataxis: https://diataxis.fr/ - lots of work but can be used as an overall guide
  • Training
    • How to adapt to different prerequisite knowledges?
    • How to make training discoverable?
      • HSF Training Center
        • have plausible web analysis around all of our training material investigate discoverability/user behavior
        • Had GSoC proposal to rebuild this in a more dynamic website able to list more and filter it by need (and some alpha-versions were made that we could start from)
        • Need to find good balance between too narrow in focus and too wide
        • Negative example for "too wide": "Awesome lists" that keep on expanding and stop being useful
    • Should we have a service for providing a Binder service.
  • Contribution
    • Should we have a mechanism for third-parties to contribute documentation to specific repositories from a centralised source?
    • How can we get users (especially new users) to write documentation?
    • How can we break down hurdles:
      • Lean more heavily on Web-IDE (GitHub codespaces etc.) for PRs
      • Essential to give users the option to give feedback quickly (without having to create issues/pull requests). Ideal: feedback/comment button. Example sphinx-comments
      • Could people earn "Karma". Become official contributors to the project if they report a documentation issue.
      • Angus: Let's come up with a "social ethos" to help build a community of users. How to build a community?
      • Matthew: People are very shy/apprehensive about doing things "in public". Could there be a sandbox for it? Or having private tickets (also think of GDPR)? Philip: Pythia does this.
      • Matthew: People may start to prefer video resources rather than reading material. One problem is that it is harder to keep video up-to-date.
      • Kilian: There are already some prototypes video documentation for some HSF Training Modules (like Docker etc.).
      • Juraj: Tutorials ready from CI.
      • Angus: create stubs:
        • Users can see which topics are already identified
        • New developers can see where to start contributing
        • Long-term developers can contribute in free time
  • User feedback:
    • How can we hear back from the students?
  • Other ideas/suggestions:
    • Benjamin: Workshop about "what to do if there's no documentation?"
    • Philip: Discussions sounds similar to HEP Forge: Find packages, link them together on a page. Do we repeat history? So what can we learn from that? Reasons HEP Forge failed:
      • ran out of funding
      • overambitious: wanted to be more than just an organizing/discoverability project but wanted to solve versioning (and failed)
        • switch from SVN to fabricator/git (and lost people in the switch)
  • Running code:
    • Angus: Should we have central "Binder" service?
  • Making our live easier:
    • Have more things as prerequisites to even out level of participants and to make sure we don't lose time with "trivail things"
    • Oksana: How can we train developers that they engage users and write peoper documentation.
      • How to write:

Conclusions

  • Discoverability: Need to link resources together and make them discoverable:
    • Documentation/training material should be interlinked
    • HSF Training Center can be expanded to make tutorials discoverable. Must strike balance between notable/maintained and inclusive. Considerations:
      • Resources have to be curated. Set minimum standards for quality and notability.
      • Be clear about scope, don't be HEP forge or one of the awesome-xyz lists
    • Plausible can be used to understand where users come from/where they go
  • Contributions: Making it easier to contribute/give comments (rather than opening PRs): Options include:
    • teaching people about GitHub Web IDEs for simple PRs
    • include more feedback/comment buttons (like sphinx-comments), ideally also anonymous.
  • Prerequisites: Having prerequisites for workshops can "even out" experience levels of people and avoid "trivial questions"
  • Maintainability: guaranteeing that code examples work (see also CI remark; documentation from Jupyter notebooks) and that interlinking is correct (e.g. linkcheck).
  • Hacking away & other ideas:
    • Regular "documentation day"
    • Half-day workshop as part of PyHEP to help developers write good (or any at all) user guides/documentation. Could also use that to get everyone to interlink things.

To be continued on Thursday

klieret commented 1 year ago

Final notes (restructured and including material from plenary session on Thu):

Teaching, training, documentation and coordination

Present Tuesday session: Matthew Bellis, Angus, Oksana, Mason, Aman, Zoe, Kilian, Ben, Remco, Clemens, Josue, Benjamin, Juraj

gh issue

Questions & discussion

  • Documentation interlinking:
    • How to connect the different resources (how to guides, training)? Scikit-HEP is decentralized development, but central vision. Possible solution: Analysis gallery
  • Analysis Gallery: Should there be a central "analysis" gallery
    • LHCb StarterKit Analysis Essentials has an example analysis for beginners that brings everything together
    • Astropy also has learn.astropy.org, a collection of notebooks that serve as howto-guides
      • Matthew: Does the complexity of our "real analyses" map onto such simple examples?
      • Matt: Also astropy is just a single package (that has majority of fields behind it)
      • Matthew: ROOT has gallery of examples. Could convert these. (https://root.cern/doc/master/group__Tutorials.html)
    • Alex: How to curate? What's the threshold for "too simple" to "too hard"
    • Aman: Could have an interactive "map" that can be clickable and links to the documentation
    • Jim: Might host training and documentation together
    • Clemens: Could have "learning paths"
  • Discoverability
    • How to make training discoverable?
      • HSF Training Center
        • have plausible web analysis around all of our training material investigate discoverability/user behavior
        • Had GSoC proposal to rebuild this in a more dynamic website able to list more and filter it by need (and some alpha-versions were made that we could start from)
        • Need to find good balance between too narrow in focus and too wide
        • Negative example for "too wide": "Awesome lists" that keep on expanding and stop being useful
    • How to interlink different documentations/trainings
  • Relation between different kinds of matrial: Diataxis: https://diataxis.fr/ - lots of work but can be used as an overall guide
  • Getting people to contribute & getting user feedback
    • Should we have a mechanism for third-parties to contribute documentation to specific repositories from a centralised source?
    • How can we get users (especially new users) to write documentation?
    • How to add a "comment" box for notebooks:
      • Jim: Could have link to gh issue/hackmd/etc.
        • How to give notification to developer
      • Could do "hypothesis"
    • How can we break down hurdles:
      • Lean more heavily on Web-IDE (GitHub codespaces etc.) for PRs
      • Essential to give users the option to give feedback quickly (without having to create issues/pull requests). Ideal: feedback/comment button. Example sphinx-comments, sphinx-disqus (from this FAQ on RTD)
      • Could people earn "Karma". Become official contributors to the project if they report a documentation issue.
      • Angus: Let's come up with a "social ethos" to help build a community of users. How to build a community?
      • Matthew: People are very shy/apprehensive about doing things "in public". Could there be a sandbox for it? Or having private tickets (also think of GDPR)? Philip: Pythia does this.
      • Juraj: Tutorials ready from CI.
      • Angus: create stubs:
        • Users can see which topics are already identified
        • New developers can see where to start contributing
        • Long-term developers can contribute in free time
  • Training paradigms:
    • Matthew: People may start to prefer video resources rather than reading material. One problem is that it is harder to keep video up-to-date.
      • Kilian: There are already some prototypes video documentation for some HSF Training Modules (like Docker etc.).
    • Having more things as prerequisites to even out level of participants and to make sure we don't lose time with "trivial things"
  • Platforms for running code for training:
    • Angus: Should we have central "Binder" service?
  • Forum & chat
    • similar to ROOT forum?
    • Both chat and forum have merit
    • Been over a year that we talked about this.
    • what to use for chat? discord?
    • How to balance chat vs forum?
      • Jerry: Bot in chat that generates discourse post, initially invisible, for posterity.
    • Forum ~ stackoverflow-ish
  • Other ideas/suggestions:
    • Benjamin: Workshop about "what to do if there's no documentation?"
    • Philip: Discussions sounds similar to HEP Forge: Find packages, link them together on a page. Do we repeat history? So what can we learn from that? Reasons HEP Forge failed:
      • ran out of funding
      • overambitious: wanted to be more than just an organizing/discoverability project but wanted to solve versioning (and failed)
        • switch from SVN to fabricator/git (and lost people in the switch)
    • Oksana: How can we train developers that they engage users and write peoper documentation.
    • Office hours similar to scipy? - Repeating new contributors meeting

✨ Conclusions & actionable items ✨

  • Discoverability: Need to link resources together and make them discoverable:
    • Documentation/training material should be interlinked
    • HSF Training Center can be expanded to make tutorials discoverable. Must strike balance between notable/maintained and inclusive. Considerations:
      • Resources have to be curated. Set minimum standards for quality and notability.
      • Be clear about scope, don't be HEP forge or one of the awesome-xyz lists
    • Plausible can be used to understand where users come from/where they go
  • Contributions: Making it easier to contribute/give comments (rather than opening PRs): Options include:
    • teaching people about GitHub Web IDEs for simple PRs
    • include more feedback/comment buttons (like sphinx-comments), ideally also anonymous.
    • include stubs for things that are missing in docs and should be filled in
  • Prerequisites: Having prerequisites for workshops can "even out" experience levels of people and avoid "trivial questions"
  • Feedback: Jim recommended directly implementing feedback buttons into notebooks (e.g., via Slido)
  • Maintainability: guaranteeing that code examples work (see also CI remark; documentation from Jupyter notebooks) and that interlinking is correct (e.g. linkcheck).
  • Hacking away & other ideas:
    • Regular "documentation day"
    • Half-day workshop as part of PyHEP to help developers write good (or any at all) user guides/documentation. Could also use that to get everyone to interlink things.

Actionable items

  • Fork (or take inspiration from) https://learn.astropy.org
    • hsf training has around 3-5 alternative "from scratch" implementations/PoCs of similar platforms that we could consider/start from. Might also rope in some of the GSoC candidates with JS knowledge.
    • Purpose is to provide a minimal (basis?) set of tutorials that provide a starting point for self-learning(?).
    • Do we split tutorial from guides at the top-level, or as a filter criterion?
  • Design an updated pipeline/map for AS (maybe even interactive) - https://iris-hep.org/as.html
    • Can be non-linear, to provide a better visual overview of how different packages integrate/can be used at different stages
    • Adding a place for videos (for the ones that have been presented live), and a place for time estimate: "This tutorial will take 10 minutes."
    • Easy pipeline for contributors to add new ones.
eduardo-rodrigues commented 1 year ago

Hello @klieret, all. Thank you for making this executive summary. Very handy for people not present and in general. Very instructive also 👍 .

You say above

How to connect the different resources (how to guides, training)? Scikit-HEP is decentralized development, but central vision. > Possible solution: Analysis gallery

What do you mean by decentralised? After all we have always been community-driven/-oriented.

Note also that the idea of an analysis gallery has been with us for a while, see https://github.com/scikit-hep/scikit-hep.github.io/issues/108. It would be excellent if anyone would be willing to push the idea over the creation threshold.

agoose77 commented 1 year ago

I can elaborate slightly based upon my recollection.

The challenge with getting started in our ecosystem is that we operate in a very decentralised fashion, crudely on a per-package basis, with some developers maintaining a subset of packages with greater overview. But, users aren't likely to think about their problems in terms of Python packages when e.g. asking the question "how do you read a ROOT file from some CMS experiment and perform this analysis?". It would be nice if we had a central hub that gave users a starting point to

  1. understand how the community is structured (packages, people, etc)
  2. understand which tools they might need (tutorials, guides etc)

An analysis gallery is one aspect to this, but we also discussed a user forum.

pfackeldey commented 1 year ago

Hi @agoose77,

I'd like to add a comment/idea to your post which came to my mind after this discussion at PyHEP.dev.

We talked about how important workflow systems are because they encapsulate logical steps in an analysis and connect them in a graph (e.g. Luigi/law). This results in the pattern you describe: one logical step does not necessary involve only one package of the ecosystem, and users (at least me) think in these logical steps. One exemplary step I have in mind is usually done in a typical (CMS) analysis workflow:

Before writing datacards (for combine) you have a bunch of histograms from your coffea processing step, and now you want to read them, rebin them, potentially apply some modifications (e.g. smoothing), and writing them back into ROOT TH1. The steps are:

  1. Read histograms from the coffea output (e.g. pickle) [uproot (reading)]
  2. Calculate a new binning (+ e.g. smoothing) [numpy / numba]
  3. Apply a new binning [boost-histogram / hist]
  4. Write the rebinned histograms to ROOT TH1 [uproot (writing)]

I've added the packages necessary for this logical unit of an analysis in square brackets behind each step (at least that's what we used in our analysis). While each package has wonderful documentation about its own API and usage, there is no - or at least very few - documentation about such a whole logical unit of an analysis.

Thinking very far in the future now: It might be cool to have a sketch like this:

but rather arranged in typical logical units of a full analysis, where users can click on a logical unit and see (real world) examples how a multitude of packages from the ecosystem can be used to solve these common steps.

At the same time this arrangement/sketch might be encouraging for users to use and think in workflows for their analysis as it is already arranged like this.

Sorry to chime in on the discussion as a user, and please ignore the noise if this has already been discussed.

Best, Peter

klieret commented 1 year ago

@eduardo-rodrigues @agoose77 Sorry for the late reply! We'll have a dedicated meeting today at 5pm CERN time to discuss how to combine the "analysis gallery" ideas together with a revamp of our training center.

What do you mean by decentralised? After all we have always been community-driven/-oriented

Ah, probably 'decentralized' wasn't the right word (perhaps 'modularized' would have been more factual). This was meant in comparison to ROOT where everything is a single package (which as certain advantages for new learners).

agoose77 commented 1 year ago

Chat log from Zoom: https://gist.github.com/agoose77/01a22f4c2a3e33c424815ab68ffa2731

eduardo-rodrigues commented 1 year ago

Hello all. Many thanks for your clarifications and comments. Makes very much sense.

Unfortunately you caught me about to go on hols and I'm now catching up; almost done. I will for sure follow what I can.