DSCI-310-2024 / data-analysis-review-2024

2 stars 0 forks source link

Submission: Group 16: Stellar Classification Predictor #16

Open ttimbers opened 6 months ago

ttimbers commented 6 months ago

Submitting authors: Aron Bahram, Olivia Lam, Lucy Liu, and Viet Ngo

Repository: https://github.com/DSCI-310-2024/DSCI310-Group16-Stellar_Classification/releases/tag/v0.1.5

Abstract/executive summary:

Our project looks towards the skies to classify stars to their given spectral types according to their different electromagnetic radiation magnitudes. Our goal is to expand our understanding of stars through their five radiation band types, and explore how data analysis can further our knowledge beyond our galaxy through the study of photometry, dynamics of celestial bodies, and stellar interactions. Our research comes from a data set on planetary systems from NASA’s Exoplanet Archive. Our simple categorization of stars may seem small, but it contributes to the bigger pursuit of celestial research and perhaps even planetary exploration.

Editor: @ttimbers

Reviewer: Anshoor Kaur, Oliver Gullery, An Zhou, Xander Dawson

anshnoorkaur commented 6 months ago

Data analysis review checklist

Reviewer: anshnoorkaur

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

  1. A lot of the documentation for figures and tables in the rendered pdf report says “Figure ??” And “Table ??”. These need to be updated to reflect the correct figures and tables.
  2. I tried to reproduce the analysis using the “Reproducing the results in a docker container” section of the ReadMe but was unable to do so. Step 2 of the instructions is missing a “.” in the end. The general execution even after adding it wasn’t successful on my end. It might also just be my computer (MacOS) but it is worth looking into. A little more documentation stating to go into the project directory after step 1 might also be useful! Here is a snippet of the error:

Reading state information... Done

Dockerfile:34

32 | RUN curl -o quarto-linux-amd64.deb -L https://github.com/quarto-dev/quarto-cli/releases/download/v${QUARTO_VERSION}/quarto-${QUARTO_VERSION}-linux-amd64.deb 33 | RUN apt-get install gdebi-core -y 34 | >>> RUN gdebi quarto-linux-amd64.deb --non-interactive 35 | # install TeX for quarto 36 | RUN quarto install tinytex

ERROR: failed to solve: process "/bin/sh -c gdebi quarto-linux-amd64.deb --non-interactive" did not complete successfully: exit code: 1

  1. Some improvements can also be made in the documentation within functions and tests to ensure consistency in the documentation approach. I saw more than one style of documentation formatting in the functions and tests files as well as some missing documentation in the same files.

These are just a few improvement points. Overall, great job!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

brico12 commented 5 months ago

Data analysis review checklist

Reviewer: brico12

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

  1. For the function documentation, the functions are clearly shown and documented in the code. However, the functions this analysis is capable of could be shown more intuitive to users. This might be done by, for example, adding a new section in the readme file.
  2. The report is written properly in concise, comprehensive language, providing authors’ insights into the analyzing results. However, the structure and content of the analysis do not quite satisfy all the requirements. It would be better if some more in-depth analysis were provided in the discussion session because the content existing, for now, tends to have too much of a description of the result, instead of the authors’ own voice. Also, the conclusion part is missing.
  3. The authors did a fantastic job in code readability. Authors make it easy for readers to understand the purpose of the analytical code by clearly commenting on the code functions. This is very helpful for achieving a trustworthy workflow.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Ollie-Gullery commented 5 months ago

Data analysis review checklist

Reviewer: ollie-gullery

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

  1. Firstly, the report is really well done, in particular, the language is concise and really easy to follow. It breaks down the topic to help readers understand, in particular, the definition section of the report helped me understand the purpose of the report a lot better as initially I was confused over some of the topics.
  2. The formatting is really strong in the report, however, there are times when numbers of headings in the table are too long and overlap which causes the tables to be difficult to follow (table 3 and 5 of the pdf specifically). One way to address this issue could be making the number of significant figures of each number in a table 4, this could help with consistency and reduce overlapping in the tables!
  3. The code was really well written, in particular, comments above different sections of the code help significantly with readability. This was done particularly well in the data_eda.py file where there were doc strings explaining the functionality of each function in addition to comments above different sections of code. However, in most of the test files this same level of depth was not replicated. Adding doc strings and comments above code in this section could really help improve the overall readability of the code for the tests and make it more clear why you guys chose the tests you did!

In terms of the N for automation, the docker build in the instructions on the Github page did not work for me and returned an error!

Great job overall though guys!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

adaws01 commented 5 months ago

Data analysis review checklist

Reviewer: adaws01

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 3

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.