DSCI-310-2024 / data-analysis-review-2024

2 stars 0 forks source link

Submission: Group 2: Uncovering the Drivers of Housing Prices in Beijing:The Influence of Location and Time #2

Open ttimbers opened 3 months ago

ttimbers commented 3 months ago

Submitting authors: Prabhjot Singh, Yunxuan Zhang, Chenyi Zhao, Yelia Ye

Repository: https://github.com/Chenyi0309/dsci310-group02-project/releases/tag/Publish02

Abstract/executive summary:

In this study, we investigate the primary factors that influence the cost of homes in Beijing. By analyzing data from Lianjia.com, we explore how the location of a property and the timing of its sale affect its price. This research aims to shed light on the complex dynamics of Beijing's real estate market and provide a clearer picture for individuals looking to understand the value of real estate in this bustling metropolis.

Editor: @ttimbers

Reviewer:Erika Delorme Kaylan Wallace Ethan Kenny Tak Sripratak

Kaylan-W commented 3 months ago

Data analysis review checklist

Reviewer: Kaylan-W

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

  1. I really liked many of your visualizations! They were creative, colourful and informative.
  2. There were some concerns about the files in the repo. Some unneeded files were included, like .ipynb_checkpoints. There’s also count_report.qmd in the reports folder, which was empty.
  3. There were some errors in the instructions provided for getting started with the repo in the README. In the cd command, the name of the directory was missing -project at the end. There were also some errors with creating the environment file and building the docker image. After resolving some dependency issues, the code ran successfully in RStudio, so your scripts work!
  4. The reports folder included a references.bib file, but the final report qmd had manually done citations and no references section at the end.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

whitecat2021 commented 3 months ago

Data analysis review checklist

Reviewer: whitecat2021

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

Although I have a lot of suggestions here, a lot of these are minor suggestions. I believe that the project question is quite interesting and complex. I appreciate very much the detailed and informative visualizations produced in the report!

  1. Organization and naming conventions:
    a. In the tests folder, there seems to be a testthat/ subfolder missing from the tests/ folder in which you would have your helper and test functions. I would suggest reorganizing the tests folder to make it easier for users to navigate to. Also, because these test scripts have very similar names to the function scripts in your R/ folder, I think that editing your test script names so they follow this naming convention “test-function_name” instead of having “test” at the end of your script name would also make it easier for users to know exactly that the scripts in your test folder are test scripts and not a copy of your functions in the R/ folder. b. I believe that there are some files in the project root that could be removed. For example, environment.yml file could be removed as you have the Docker file now to run the dependencies and run your analysis file. c. I would suggest adding a raw/ and processed/ folder with the raw and processed data in the data/ folder, so it makes it easier for users to differentiate between both types of data and have access to both.

  2. Code style: a. Your scripts are very detailed and it makes sense the way they were produced! However, I noticed that they don’t exactly follow the style guideline we learned in class for R. You could use the docopt package and have a main function that runs all of the necessary code for each script instead. Moreover, it would be nice as a user to have a brief explanation of what the scripts do in the script.

  3. Instructions: a. In your README.md, there are clear written instructions on how to run the docker file and run the analysis. However, there is no explicit mention or direction to the Dockerfile with the dependencies. Users can find the Dockerfile if they search, however explicitly directing users to the Dockerfile would be very helpful. b. The instructions are clear, but in step 1 of the section “Getting started” of your installation instructions, I think it would be helpful to explicitly write to run these commands in the command prompt or terminal. I think most people in the field would understand to do this, but individuals with less expertise or experience in the field may not know where to run these commands to run the Dockerfile. c. Moreover, I believe that the set-up instructions could be updated so that it instructs how to install the Docker file instead of having both the environment.yml file and Docker file installation instructions. Similarly, you could remove step 1 in “Project Execution” to only have instructions relating to the Docker file, not the environment.yml file. d. I would also suggest adding to your set up instructions that users need to download Docker on their computer and open it to pull the docker image before running the analysis because users may not know that.

  4. Report: a. The visualizations provided are very detailed and original, making it easier for readers to grasp the data shown! However, I would suggest having a bit more explanation or conclusion about the results. Although there are explanations of what each figure or table represents, there is limited explanation as to what this means in terms of the research question and the overall results and implications for your project.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

tak4563 commented 2 months ago

Data analysis review checklist

Reviewer: tak4563

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 3

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Ekenny02 commented 2 months ago

Data analysis review checklist

Reviewer: Ekenny02

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1

Review Comments:

  1. The dependency versions could have been explicitly stated in the readme installation instructions to reduce room for error and improve reproducibly.
  2. Could have put more emphasis on the importance of the topic as well as the significance of the results with examples of potential use cases.
  3. Test file naming could have been more clear in the format test_xx.
  4. A non-linear model may have been a better fit looking at the price-prediction plot. The data appears to trend more logarithmically than linearly.
  5. The map plots were a great idea and really helped me better understand the results.
  6. Overall processing, analysis, and best practices were well done just with some minor things to improve.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.