It’s great to see new training materials being developed for and by the community and for these to be shared openly. This helps to address sustainability and equality in bioinformatics training.
Your submission provides an informative description of the features of your newly developed materials on pangenomics. If not accepted as a proceedings paper, I encourage you to resubmit as an oral or poster abstract for ISMB to help raise awareness of this new resource.
[ ] The more interesting part of the paper is the story of how the materials and subsequent workshops came to be but this is also the shortest part of the paper. It would be wonderful to see this expanded to become the feature of the paper (and perhaps the earlier descriptive parts shortened). Exploring and describing this process in more depth would serve as a practical example to other communities of how contributing to a collective (training) goal can also promote community building and inclusivity. I understand that this would be a significant amount of work to change the paper in this way but it would vastly improve its value to the bioinformatics community.
Some other questions you might want to consider:
[x] You mention “our community of practice” but don’t expand further. What is this community and who is in it?
[x] The Carpentries has an extensive library of data science training materials. Were these reused in the basic bash and python lessons?
[x] How did the community work together to define the topics to be covered and develop the content of the materials?
[x] It’s great to see the positive outcomes of the post-workshop satisfaction surveys. Have you, or do you plan to, follow up with participants to see how it has impacted their work in the long term?
The main issue is that the results, in addition to be only declarative, are rather vague:
[x] For Figure 2, the exact questions being asked should be provided to the reader.
[x] For Figure 3, it would be good to have information about how results depend on the background of participants or whether the results depend on lessons for example (i.e. +/- TDA).
Minor remarks:
[x] section 2.3, for lesson 3 the authors should explain why only python is used whereas they insist (in abstract and introduction) on multiple programming languages being important.
Haydeé: As mentioned earlier, it is important to be proficient in multiple programming languages. However, the majority of libraries that work with ADTs are in Python. In particular, we will be working with a TDA library called Gudhi, which is implemented in Python. Therefore, we will now explore introductory material on Python.
[x] - section 3 mentions six lessons given in 16 hours. It is unclear how this maps to the 4 lessons and 24 episodes mentioned in Figure 1.
[ ] section 3.0.2, the last paragraph is probably not backed up by any data and should therefore probably be toned down.
----------------------- REVIEW 3 ---------------------
The authors describe their workshop teaching materials and strategy for pangenomics and topological data analysis. The paper is well written and easy to follow. The workshop was molded on an existing "Carpentries" approach, providing a formal framework to design such a teaching resource. The workshop has been run for a number of times and participant feedback was collected and presented. It was refined by having it taught by different teachers than those who designed it. I find the paper overall interesting, but it is maybe somewhat unclear what the actual contribution/focus is (other than letting others know about the existence of this laudable effort).
More precisely, I also had some questions that came up during reading and that may warrant some comment in the draft:
[x] How balanced is the topic coverage between the 4 topics? It seemed like some are rather small learning elements (eg bash part) whereas others are significantly more involved (eg topology part).
[x] How central is the Topological Data Analysis part actually to the topic of pangenomics?
[x] #60
[x] Note that the provided link to the workshop repository is not functional (a google search did yield however the proper link)
Minor: correct Sort Format Workshop to "Short.."
----------------------- REVIEW 4 ---------------------
The idea of having a pangenome workshop seems really good, and I am glad that the organizers have been running this successfully in multiple countries as a service to the community. Basing the format of the workshop on the model provided by the Carpentries is also good to see to ensure that the workshop is using a successful framework.
One of the challenges of running this workshop is the diverse range of background knowledge in its attendees; a natural thing to do might be to split the workshop into differing tracks, or to offer some of the material as preparatory material in order to allow the workshop to dig into greater details.
[x] #59
[x] Second, although Figure 3 is good, the improvement in command-line skills is not as significant as what I would expect to see from an in-depth workshop focusing on command-line skills. Furthermore, it seems appropriate to provide a greater number of figures showing survey data to demonstrate how attendee skills increased as a result of the workshop (notably in the understanding of pangenome analysis).
----------------------- REVIEW 5 ---------------------
The authors have developed a workshop aimed at biologists, mathematicians, and data scientists (with/without previous coding skills) to build pangenomes and analyze them with TDA.
This is an exciting paper, as this type of resource is sorely needed in our field (and in others).
My main issue is that I was unable to access these open source lessons.
[x] I checked the Supplemental Material and only found "Figure 1: Survey ppre-workshop"
and when I clicked on the Availability link below the abstract:
Manuscript Response to reviewers
----------------------- REVIEW 1 --------------------- Pre Workshop survey Post Workshop survey Form [Post Workshop survey Answers]()
It’s great to see new training materials being developed for and by the community and for these to be shared openly. This helps to address sustainability and equality in bioinformatics training.
Your submission provides an informative description of the features of your newly developed materials on pangenomics. If not accepted as a proceedings paper, I encourage you to resubmit as an oral or poster abstract for ISMB to help raise awareness of this new resource.
Some other questions you might want to consider:
----------------------- REVIEW 2 ---------------------
The main issue is that the results, in addition to be only declarative, are rather vague:
Minor remarks:
[x] section 2.3, for lesson 3 the authors should explain why only python is used whereas they insist (in abstract and introduction) on multiple programming languages being important. Haydeé: As mentioned earlier, it is important to be proficient in multiple programming languages. However, the majority of libraries that work with ADTs are in Python. In particular, we will be working with a TDA library called Gudhi, which is implemented in Python. Therefore, we will now explore introductory material on Python.
[x] - section 3 mentions six lessons given in 16 hours. It is unclear how this maps to the 4 lessons and 24 episodes mentioned in Figure 1.
[ ] section 3.0.2, the last paragraph is probably not backed up by any data and should therefore probably be toned down.
----------------------- REVIEW 3 --------------------- The authors describe their workshop teaching materials and strategy for pangenomics and topological data analysis. The paper is well written and easy to follow. The workshop was molded on an existing "Carpentries" approach, providing a formal framework to design such a teaching resource. The workshop has been run for a number of times and participant feedback was collected and presented. It was refined by having it taught by different teachers than those who designed it. I find the paper overall interesting, but it is maybe somewhat unclear what the actual contribution/focus is (other than letting others know about the existence of this laudable effort).
More precisely, I also had some questions that came up during reading and that may warrant some comment in the draft:
[x] How balanced is the topic coverage between the 4 topics? It seemed like some are rather small learning elements (eg bash part) whereas others are significantly more involved (eg topology part).
[x] How central is the Topological Data Analysis part actually to the topic of pangenomics?
[x] #60
[x] Note that the provided link to the workshop repository is not functional (a google search did yield however the proper link)
Minor: correct Sort Format Workshop to "Short.."
----------------------- REVIEW 4 --------------------- The idea of having a pangenome workshop seems really good, and I am glad that the organizers have been running this successfully in multiple countries as a service to the community. Basing the format of the workshop on the model provided by the Carpentries is also good to see to ensure that the workshop is using a successful framework.
One of the challenges of running this workshop is the diverse range of background knowledge in its attendees; a natural thing to do might be to split the workshop into differing tracks, or to offer some of the material as preparatory material in order to allow the workshop to dig into greater details.
[x] #59
[x] Second, although Figure 3 is good, the improvement in command-line skills is not as significant as what I would expect to see from an in-depth workshop focusing on command-line skills. Furthermore, it seems appropriate to provide a greater number of figures showing survey data to demonstrate how attendee skills increased as a result of the workshop (notably in the understanding of pangenome analysis).
----------------------- REVIEW 5 --------------------- The authors have developed a workshop aimed at biologists, mathematicians, and data scientists (with/without previous coding skills) to build pangenomes and analyze them with TDA.
This is an exciting paper, as this type of resource is sorely needed in our field (and in others). My main issue is that I was unable to access these open source lessons.
Availability:https://czirion.github.io/pangenomics-workshop/ I get an error accessing that page:
"404 There isn't a GitHub Pages site here."
Sadly, the paper is useless without access to a permanent webpage. I sincerely hope that this can easily be fixed!
"Abstract Motivation: Motivation:"