SORTEE-Github-Hackathon / manuscript

This repository implements an automated system to write our collaborative manuscript, while tracking changes and contributions.
https://sortee-github-hackathon.github.io/manuscript/v/latest/index.html
Other
23 stars 17 forks source link

Concern: Missing application of automated workflows in GitHub for research in ecology and evolution #261

Closed pedrohbraga closed 2 years ago

pedrohbraga commented 2 years ago

Hi, again!

In this issue, I would like to comment on the absence of the application of automated workflows in the use cases of our manuscript.

Concern

Although we had a section dedicated to this in the hackathon and although we use GitHub Actions to automate the production of our manuscript, we have not discussed automated workflows as a use case.

The usage of automated workflows within GitHub (including GitHub Actions) in ecological data‐model integration has been recommended by Fer et al. (2020, Glob. Change Biology; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7756391/).

GitHub-integrated automated workflows to decrease the time and effort required by researchers have been implemented in a series of projects.

One example is the Portal Project, a long-term study of a Chihuahuan desert ecosystem, which has been described in Yenni et al. (2019, PLOS Biology; https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000125).

Another example is the CAN-SAR, a database of Canadian species at risk information. Their automation workflow using GitHub Actions is described in Naujokaitis-Lewis et al. (2022, Nature; https://www.nature.com/articles/s41597-022-01381-8).

I feel that automatic testing of data and code and continuous integration and deployment (CI and CD) workflows are becoming a very strong part of data synthesis in ecology and evolution and that the manuscript and the readership would benefit much from a dedicated section for this.

Proposed solution

The proposition that I have is to add a new use case related to automated workflows, where a short description and applications of automation using GitHub Actions or GitHub-integrated CI and CD are provided.

I can work on creating the first draft of the section and adding it to the text. However, if this section is considered a new use case, it would also require that our figure is adjusted to include this part.

Please let me know your comments about this concern and the proposed solution!

DrMattG commented 2 years ago

I agree GitHub Actions is important to mention, but we have the constraint of the word limit. Is there a way of adding a more detailed appendix on GA without adding too many words to the main manuscript? Is contingious integration a component (some) of the 12 themes? Could we include an/some example(s) with reference to one or more of the 12 themes

robcrystalornelas commented 2 years ago

Actions are an important feature for GitHub, but we already mention them in the project continuity section and the manuscript writing section.

Rather than add actions as it's own use case which would be quite technical compared to the rest of our use cases, I propose adding something like "to learn more about GitHub actions in EEB research see [cite, cite]" either in the project continuity or manuscript writing sections.

That being said, if we're rejected from Nature E&E and our word limit changes for the next journal, and we go back to the drawing board a bit and can re-tune the figure, update the manuscript so we have 13 use cases, etc. etc. then I think that could be a good time to add in the actions use case.

Aariq commented 2 years ago

I think it is important that our readers are made aware of the possibile uses of CI for EEB research, but I agree that we should avoid getting into technical detail.

robcrystalornelas commented 2 years ago

@Aariq How to address this though? Separate section which brings us to 13 ways to use GitHub, or short sentence or two indicating other resources for learning about actions?

pedrohbraga commented 2 years ago

Hi, I do not think that we need to get into technical details. None of our sections appear to be technical. However, continuous integration and continuous deployment are powerful tools and practices that are widely used outside and that have been increasingly adopted within academia.

The practice of making small changes and performing automated, frequent tests (CI) has been shown to decrease the extent and the financial cost of errors in code and the time taken for developers to correct issues (see Elazhary et al. 2018, IEEE Xplore; https://ieeexplore.ieee.org/document/9374092, and references therein). The implications of the adoption of these practices can extend to catching problems with code early and avoiding the burden of the possibility of having noise or invalid results that undermine scientific results (see Soergel, 2015, F1000Res; 10.12688/f1000research.5930.2) to ultimately avoiding retractions due to errors in code (e.g., the apparent cause of Chang et al. 2006, Science; 10.1126/science.314.5807.1875b). The advantages of automation can also be applied to specific cases that accelerate research and improve the readiness and the integration of new data into projects, such as the one that I mentioned above.

Automation is frequently used in evolutionary biology (within sequencing and assembly pipelines, for example), and there have been recent calls to adopt these practices within ecology, such as in predictive ecology [see McIntire et al. 2022; Ecology Letters; Fer et al. 2020; Glob. Change Biology; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7756391/; Scheller et al. 2010, Frontiers in Ecology and the Environment; 10.1890/080141], and near-term forecasting (Dietze et al. 2018, PNAS; 10.1073/pnas.1710231115). Most of these studies include the mention of GitHub and the CI tools available within, such as GitHub Actions, or the others that can be integrated within the platform.

GitHub Actions is a readily available powerful tool that can be applied in these contexts. Moreover, GitHub Actions marketplace and other user repositories provide workflows that are not too complicated to implement.

I concur with the concern with the word limit (but see the other thread where there was not a very strong concern about this point), but I think that this topic has a strong potential to be interesting to the readership and that despite this, it has not been widely explored, nor discussed in our field (thus attributing novelty to our study). Because of the perception I have of these potentials I mentioned above, I am not very convinced that this should be added only if the manuscript gets rejected or solely as a passing sentence.

robcrystalornelas commented 2 years ago

OK @pedrohbraga. You mentioned being willing to draft up a version of this new section, can you do this next week?

Remember to ping @LunaSare about updating the figure with this new use case. You may have to advise a bit on the degree of collaboration in our figure below:

image

pedrohbraga commented 2 years ago

Yes! I can work on a pull request by Tuesday evening (Eastern Canada). I will be sure to tag some of us to review it. I will also communicate with Luna when I am done.

pedrohbraga commented 2 years ago

Thanks everyone for the help! This has been solved in #265! Almost there!