JuliaCon / proceedings-review

6 stars 1 forks source link

[REVIEW]: Julia for HPC: In Situ data Analysis with Julia for Climate Simulations at Large Scale #134

Closed whedon closed 2 days ago

whedon commented 8 months ago

Submitting author: !--author-handle-->@ltang85<!--end-author-handle-- (Li Tang) Repository: https://github.com/ltang85/In-Situ-data-Analysis-with-Julia-for-Climate-Simulations-at-Large-Scale Branch with paper.md (empty if default branch): Version: v1.0 Editor: !--editor-->@luraess<!--end-editor-- Reviewers: @simonbyrne, @williamfgc Archive: 10.5281/zenodo.12104097

Status

status

Status badge code:

HTML: <a href="https://proceedings.juliacon.org/papers/d9888b7336413fb1ab432e8d864fd6cb"><img src="https://proceedings.juliacon.org/papers/d9888b7336413fb1ab432e8d864fd6cb/status.svg"></a>
Markdown: [![status](https://proceedings.juliacon.org/papers/d9888b7336413fb1ab432e8d864fd6cb/status.svg)](https://proceedings.juliacon.org/papers/d9888b7336413fb1ab432e8d864fd6cb)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@simonbyrne & @williamfgc, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @luraess know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @simonbyrne

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Paper format

Content

Code of Conduct

General checks

Functionality

Documentation

Paper format

Content

whedon commented 8 months ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @simonbyrne, @williamfgc it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/JuliaCon/proceedings-review) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/JuliaCon/proceedings-review:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 8 months ago

Failed to discover a Statement of need section in paper

whedon commented 8 months ago

Wordcount for paper.tex is 6545

whedon commented 8 months ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.02 s (709.2 files/s, 331315.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
TeX                              9            425           1744           4676
Ruby                             1              8              4             45
YAML                             1              1              0             43
Markdown                         1             12              0             29
TOML                             1              2              0              8
Julia                            2              3              1              7
-------------------------------------------------------------------------------
SUM:                            15            451           1749           4808
-------------------------------------------------------------------------------

Statistical information for the repository '6508ac24f7eee7b98ab151a5' was
gathered on 2023/10/12.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Li Tang - LANL                   3           114             57          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Li Tang - LANL               57           50.0          0.0                7.02
whedon commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 8 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1137/141000671 is OK
- 10.1287/opre.38.5.911 is OK
- 10.1175/JCLI3996.1 is OK
- 10.5194/gmd-9-1937-2016 is OK
- 10.1175/JCLI-D-16-0465.1 is OK
- 10.1145/3144769.3144778 is OK
- 10.1109/LDAV.2011.6092322 is OK
- 10.2312/EGPGV/EGPGV11/101-109 is OK
- 10.1145/1383529.1383533 is OK
- 10.1177/1094342020935991 is OK
- 10.11578/E3SM/dc.20180418.36 is OK
- 10.1111/j.1467-8659.2011.01964.x is OK
- 10.1109/CLUSTER.2018.00036 is OK
- 10.1109/TVCG.2014.2346458 is OK
- 10.1175/JCLI-D-16-0465.1 is OK
- 10.1109/TVCG.2015.2467411 is OK

MISSING DOIs

- 10.1029/2018jd028927 may be a valid DOI for title: Parametric sensitivity and uncertainty quantification in the version 1 of E3SM atmosphere model based on short perturbed parameter ensemble simulations
- 10.1109/drbsd754563.2021.00009 may be a valid DOI for title: TributaryPCA: Distributed, Streaming PCA for In Situ Dimension Reduction with Application to Space Weather Simulations
- 10.1036/1097-8542.757451 may be a valid DOI for title: Middle atmosphere dynamics
- 10.1109/smartgridcomm.2016.7778841 may be a valid DOI for title: Smartsim: A device-accurate smart home simulator for energy analytics
- 10.1145/3426462.3426465 may be a valid DOI for title: Chimbuko: A workflow-level scalable performance trace analysis tool
- 10.1175/jcli-d-15-0835.1 may be a valid DOI for title: Detection and attribution of changes in extreme temperatures at regional scale
- 10.1007/978-3-540-30218-6_19 may be a valid DOI for title: Open MPI: Goals, concept, and design of a next generation MPI implementation
- 10.2139/ssrn.3287769 may be a valid DOI for title: The Decline of Computers as a General Purpose Technology: Why Deep Learning and the End of Moore’s Law are Fragmenting Computing
- 10.1145/3490138.3490142 may be a valid DOI for title: In Situ Climate Modeling for Analyzing Extreme Weather Events
- 10.1175/1520-0442(2001)014<1959:codaet>2.0.co;2 may be a valid DOI for title: Characteristics of daily and extreme temperatures over Canada
- 10.5194/ascmo-2-79-2016 may be a valid DOI for title: Estimating changes in temperature extremes from millennial-scale climate simulations using generalized extreme value (GEV) distributions
- 10.1137/080731992 may be a valid DOI for title: Communication-optimal parallel and sequential QR and LU factorizations

INVALID DOIs

- https://doi.org/10.1029/2018MS001603 is INVALID because of 'https://doi.org/' prefix
- https://doi.org/10.1029/2020RG000708 is INVALID because of 'https://doi.org/' prefix
simonbyrne commented 8 months ago

@luraess the links to the conflict of interest policy and code of conduct don't work.

luraess commented 8 months ago

@simonbyrne thanks for flagging, while we work on a fix, you can access those here https://juliacon.github.io/proceedings-guide/reviewer/#review_guidelines_and_process

whedon commented 8 months ago

:wave: @simonbyrne, please update us on how your review is going (this is an automated reminder).

whedon commented 8 months ago

:wave: @williamfgc, please update us on how your review is going (this is an automated reminder).

luraess commented 8 months ago

@simonbyrne @williamfgc Human asking about the status of the revision? Ideally, we can try to get this done in the coming weeks (2 ideally). Thanks so much for your contribution!

simonbyrne commented 7 months ago

I'm on it, will update later this week.

williamfgc commented 7 months ago

Same here, I did a quick pass, but I need to go through it more carefully.

luraess commented 7 months ago

@simonbyrne @williamfgc also let me know if you still have issue checking tick-boxes.

simonbyrne commented 7 months ago

This was an interesting application of embedding Julia inside another program for in situ analysis. Although I am not able to assess the scientific validity of the analysis (which seems incomplete), from a technical perspective, this is an important problem: as simulations reach higher resolutions, it becomes more expensive and difficult to store all the data for post-processing analysis, thus it is helpful if such analysis can be done while the simulation is running ("in situ").

Overall, I thought the paper was reasonable, but the exposition could be improved in a few places.

There is no linked code, (other than the TributaryPCA.jl package, and this doesn't meet most of the above criteria, such as documentation), so I can't really check off those relevant boxes. While the code may not strictly be required, it is a shame as I think this could be of interest to others who are attempting something similar. I would encourage the authors to make those available if possible.

Specific comments

3.3 C Interface

It is also important to conduct garbage collection (GC) in this C interface for every time step. This is because arr and p_arr in the code snippet are allocated and then deallocated in E3SM for every time step, and the Julia runtime is not aware of this and hence needs explicit GC

This is not what the C code in code 2 is doing: as described in https://docs.julialang.org/en/v1/manual/embedding/#Memory-Management, the JL_GC_PUSH3 macro “roots” the variables so they are not collected, and JL_GC_POP simply removes that root (so they can be cleaned up by the garbage collector). The Julia garbage collector runs automatically (unless it is disabled, but this is not described in the paper)

3.5 Noninvasive Compilation

Some more details could be useful here: how did you load the Julia code? (e.g. did you make use of a precompiled system image?) How did you switch between (or switch off) the in situ analysis codes? Again, having the actual code available could make this easier.

4.1 Sudden Stratospheric Warming (SSW)

I'm not sure Algorithm 1 is correct: shouldn't it reset the counter to 0 if "global zonal mean zonal wind >= 0"? If it is negative for 20 days, does that count as 2 events?

There isn't much detail on the actual GEV fitting, and I found it difficult to understand. Is it fitting a different GEV distribution at every point? What is the prior distribution used for the GEV parameters? Some more details on this would be helpful (as well as linking to the code).

4.2 TributaryPCA

While I understand the problem it is trying to solve, I found it difficult to understand algorithm, and the Code 4 block is somewhat unclear:

(I don't have access to the TributaryPCA paper, so couldn't look it up there either).

I would suggest either having a mathematical exposition of the algorithm, and/or making the code a bit more readable.

Is the data centered (i.e. do you subtract the mean)? The lack of red in Fig 3, PC1 suggests that it isn't, but that isn't clear. If you aren't then the interpretation changes slightly.

5.2 In Situ SSW Detection and Characterization Results

I don't quite understand this: it doesn't actually analyse the SSW case? It just seems to be an illustration of how one might do an analysis.

5.3 PCA Results

The details are a bit lacking. Was the data collected every time step? Or is it based on daily snapshots or averages? How long was the simulation?

I disagree with the interpretation of figure 3: you can't really assign "warmer vs colder" to principal components (the sign of eigenvectors doesn't matter). My interpretation of the figures is that:

While it's useful that you can capture this, it's not particularly scientifically interesting.

5.4 Performance

The SSW seems to have negligible overhead (which is not surprising since it is evaluated infrequently), so there is not much to say about the analysis.

It could be helpful if you could add the number of MPI ranks to Table 1?

I'd be interested to know more about the performance bottlenecks of the PCA code? Which parts were the most expensive? Did the Julia garbage collector cause problems with scaling?

Minor comments

williamfgc commented 7 months ago

Review: Julia for HPC In Situ Data Analysis with Julia for Climate Simulations at Large Scale

The work present the implementation of an “in-situ” framework with components written in Julia coupled with the well-known E3SM framework model. Two main post-processing tasks are evaluated: Sudden Stratospheric Warming (SSW) and principal component analysis (PCA). SSW is a smaller component while PCA is more computationally intensive due to the required Cholesky decomposition execution. I must admit I am not a doing expert in E3SM, but I can appreciate the computational effort in plugging Julia into an existing framework for a novel application.

Computational experiments show that in-situ provides reasonable overheads for a larger number of processing elements (PEs) usage per node.

Overall, it’s a straight-forward application of how Julia can be added to an existing framework and add value for post-processing tasks.

Major points:

  1. Make the abstract more to the point for this work. Write in 250 words what was done for this paper and is unique. It reads more like an introduction. Please see your references for examples for self-contained abstracts
  2. The introduction needs fewer technical details as they can go in the background section. Instead, explain why the “challlenges” in the second paragraph are actual challenges. It needs a more high-level explanation of why Julia is chosen over current alternatives earlier in the introduction. Provide the motivation, hypothesis, etc. Also, describe the outline of the paper: Section 2 describes X, Section 3 introduces Y, etc. That way the reader has a good idea what to expect.
  3. Explain the motivation for in-situ analysis: is it data bottlenecks? Is it the only way this could be achieved? What are the potential trade-offs? JIT, back end support, ecosystem maturity etc.
  4. Overhead in Figures 4 and 5 results is unclear or not noticeable, perhaps use a different scale or plot style
  5. Please mention efforts in SCORPIO, as it’s been an effort to add more I/O backend to E3SM. It would be good to compare with existing in-situ methods.
  6. One major question is not described, does the current data consumption pattern is independent of the simulation output? Meaning there is some buffer (near real-time) or this is not required as the post-processing is almost negligible.
  7. One thing it’s not clear is what’s the quantified advantage over just storing data and doing post-processing. Is this only possible using “in-situ”?
  8. Also, add codes available for reproducibility purposes.
  9. The motivation for SSW is unclear. It's mostly "there is no overhead" type of case, but how does it compare with using disk directly and not in-situ?

Minor points:

  1. Spell out acronyms on first use (e.g. R&D, EAM) starting in the introduction section.
  2. LLVM is no longer low-level virtual machine
  3. The phrase: “Julia can couple with C/C+++/Fortran applications that are naturally supported by LLVM.” I think it is not completely accurate. Julia can interoperate with C and Fortran libraries not necessarily compiled or “supported” by LLVM
  4. ADIOS2 is mentioned but the cited reference is for an earlier ADIOS work. Please cite the appropriate ADIOS2 reference.
  5. Related to 4. I’d highly encourage citing papers related to packages used (e.g. MPI.jl)
  6. If possible, add a table to describe the hardware characteristics
  7. Improve Table 1 caption PE Layout is also the 1st column, maybe add MPI information
luraess commented 7 months ago

Thanks for handing in your revision @simonbyrne @williamfgc . IMPORTANT Please go through the tick-box list and check all points that are addressed leaving the remaining as "to-dos". All tick-boxes will have to be checked upon publication.

luraess commented 7 months ago

@ltang85 now that the 2 reviewers gave you feedback, please address their comments as soon as possible. Also, please take a look at https://github.com/JuliaCon/proceedings-review/issues/134#issuecomment-1759604442 as you'd need to fix those references as well.

luraess commented 6 months ago

@ltang85 just checking in about the status of the revisions, given that you've got feedback since one month ago. Please try to address the issues raised by the reviewers in the coming weeks, and do not hesitate to reach out if there is anything we can help you with. Thanks.

ltang85 commented 6 months ago

@luraess I am still working on it and should be able to address all the comments in 2 weeks. Thanks.

luraess commented 6 months ago

@luraess I am still working on it and should be able to address all the comments in 2 weeks. Thanks.

Thanks - take your time. Was just checking in about progress.

luraess commented 3 months ago

@ltang85 seems there was no activity update for a while here. Can we get the process to a final stage in the coming week?

ltang85 commented 3 months ago

@luraess sorry for the delay, I will try to finish it by next week.

luraess commented 3 months ago

Excellent. We would like to wrap up the previous submissions ideally in the coming 1-2 weeks and then open the system for JuliaCon23.

ltang85 commented 2 months ago

@luraess Hi, here is the update: we have addressed most of the reviewers' comments but will need a few more days for the rest of the comments.

luraess commented 2 months ago

Great, thanks @ltang85 for the update and looking forward to finalising your submission!

ltang85 commented 2 months ago

@whedon generate pdf

editorialbot commented 2 months ago

My name is now @editorialbot

ltang85 commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:warning: An error happened when generating the pdf.

ltang85 commented 2 months ago

@luraess I have addressed all the comments and uploaded the revised version. However, I failed to generate the pdf here. Will fix it tomorrow or Tuesday.

luraess commented 2 months ago

Thanks! Seems some figure is missing (https://github.com/JuliaCon/proceedings-papers/actions/runs/8683268959/job/23809007037#step:3:543).

ltang85 commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:warning: An error happened when generating the pdf.

ltang85 commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ltang85 commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ltang85 commented 2 months ago

@luraess I have fixed the pdf issues and it is working now. I have also uploaded a response letter adressing all the reviewers' comments in the paper folder. Thanks.

@simonbyrne @williamfgc Thank you very much for reviewing this paper and provding the comments.

luraess commented 2 months ago

@ltang85 Thanks for submitting the revised version of the manuscript. I am checking it now and will let you know if any further changes are needed before we can accept the manuscript.

@simonbyrne @williamfgc If you find some time, I'd like to get your general feedback on whether the author addressed your suggestions in a way you are happy with. Also, please take some time to thick the boxes in here https://github.com/JuliaCon/proceedings-review/issues/134#issue-1939993021 .

Thanks!

luraess commented 1 month ago

Bump @simonbyrne @williamfgc - Thanks for swiftly verifying ☝️ such that we can finalise the publication! 🙏

williamfgc commented 1 month ago

@luraess I couldn't check the boxes in #134 (comment)

luraess commented 1 month ago

@luraess I couldn't check the boxes in #134 (comment)

I recall now this issue we had already at the beginning - let's leave that out. Could you however briefly assess whether the revised version of the manuscript draft adresses your comments and suggestions @williamfgc ? Thanks!

williamfgc commented 1 month ago

I just spotted typos: e.g. consqeuent...please proof-read. Other than that, my comments were addressed. Thanks!

ltang85 commented 1 month ago

@williamfgc @luraess

Thanks for the comments. We just proof-read it again and have fixed the typo you mentioned and a few other typos.

luraess commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

luraess commented 1 month ago

@ltang85 Thanks for addressing the proof reading. While in the process of accepting your submission, I bumped on the README you link to from your paper regarding the steps to reproduce your research https://github.com/ltang85/In-Situ-data-Analysis-with-Julia-for-Climate-Simulations-at-Large-Scale/tree/main/src .

Please format this README using standard markdown syntax, in order to clearly separate code blocks from plain text. Use heading hierarchy to structure the content as well, and please remove the redundant horizontal lines. Once this last bit is fixed, we could proceed with final publication. Thanks!

simonbyrne commented 1 month ago

Sorry, it has taken me a while to get to this.

@ltang85 Are you able to quickly summarize your changes, and how they addressed the comments?

ltang85 commented 1 month ago

@luraess @simonbyrne Sorry for my late reply.

@luraess I have cleaned up the README.md file using the standard markdown syntax. Please check and let me know if you have any comments.

@simonbyrne We have (1) added the code and instructions for reproducibility, (2) corrected the mistakes in the SSW algorithm, (3) added more description texts for SSW and TributaryPCA, (4) made edits to address your other comments such as results, tables, and garbage collection. More details can be found in our response letter to your comments: https://github.com/ltang85/In-Situ-data-Analysis-with-Julia-for-Climate-Simulations-at-Large-Scale/blob/main/paper/Response%20Letter.pdf