COGS118A / Group005-SP23

COGS118A Final Project Group005-SP23 Repository
2 stars 0 forks source link

Project Proposal Feedback #1

Open scott-yj-yang opened 1 year ago

scott-yj-yang commented 1 year ago

Project Proposal Feedback

Score (out of 9)

Score = 9

Feedback:

Quality Reasons
Abstract
Background -0.75
Problem Statement
Data -0.75
Proposed Solution
Evalution Metrics
Ethics & Privacy
Team expectations
Project Timeline Proposal

Rubric

Unsatisfactory Developing Proficient Excellent
Abstract Abstract is confusing or fails to offer important details about the issue, variables, context, or methods of the project. Abstract lacks relevance or fails to offer pertinent details about the issue, variables, context, or methods of the project. Abstract is relevant, offering details about the research project. Abstract is informative, succinct, and clear. It offers specific details about the educational issue, variables, context, and proposed methods of the study.
Problem Statement Research issue remains unclear. The research purpose, questions, hypotheses, definitions or variables, and controls are still largely undefined, or when they are poorly formed, ambiguous, or not logically connected to the description of the problem. Unclear whether the research problem is quantifiable, measurable, and replicable. Research issue is identified, but the statement is too broad or fails to establish the importance of the problem. The research purpose, questions, hypotheses, definitions or variables, and controls are poorly formed, ambiguous, or not logically connected to the description of the problem. The limited description of whether the research problem is quantifiable, measurable, and replicable. Identifies a relevant research issue. Research questions are succinctly stated, connected to the research issue, and supported by the literature. Variables and controls have been identified and described. Clear reasoning and description on that the research problem is quantifiable, measurable, and replicable Presents a significant research problem. Articulates clear, reasonable research questions given the purpose, design, and methods of the project. All variables and controls have been appropriately defined. Clear and significant reasoning on the quantifiability, measurability, and replicability of the research problem. All elements are mutually supportive.
Background Did not have at least 2 reliable and relevant sources. Or relevant sources were not used in relevant ways A key component was not connected to the research literature. Selected literature was from unreliable sources. Literary supports were vague or ambiguous. Key research components were connected to relevant, reliable theoretical and research literature. Narrative integrates critical and logical details from the peer-reviewed theoretical and research literature. Each key research component is grounded in the literature. Attention is given to different perspectives, threats to validity, and opinion vs. evidence.
Proposed Solution Lacks most details; vague or interpretable in different ways. Or seems completely unrealistic and inapplicable to the project domain. Limited descriptions of the rationales and theories behind the solution provided and on how the solution will be tested. Limited relevance to the input dataset and problem to be solved. Sufficient details on algorithmic description or theoretical properties; clear definition of how the solution will be tested, reproduced, and on the benchmark used. Highly clear and succinct description of the rationales and theories behind the solution; thorough and comprehensive consideration of how the solution will be applied and tested; valid approach on how to reproduce the solution and effective benchmark to test the solution; a strong connection to the problem proposed.
Data Did not have references to relevant data sources for this problem. Did not describe the data obtained at those sources A key data source was not referenced or described in a satisfactory level of detail All relevant data sources were referenced and described in terms of their key variables and size Multiple data sources for each aspect of the project, All data sources are fully described and referenced. The details of the descriptions also make it clear how they support the needs of the project.
Evaluation Metrics Did not propose any metric for evaluating the model or very little effort in this section. Evaluation metrics proposed with limited relevance or inappropriate metrics; ambiguous description of the metrics to be used. Thoughtful and meaningful evaluation metrics with sufficient considerations and descriptions of the model to be evaluated. Effective and comprehensive evaluation metrics with thorough and detailed descriptions.
Ethics No effort or just says we have no ethical concerns Minimal ethical section; probably just talks about data privacy and no unintended consequences discussion. Ethical concerns raised seem irrelevant. Ethical concerns described are appropriate and described sufficiently Ethical concerns are described clearly and succinctly. This was clearly a thoughtful and nuanced approach to the issues
Team expectations Lack of expectations The list of expectations feels incomplete and perfunctory It feels like the list of expectations is complete and seems appropriate The list clearly was the subject of a thoughtful approach and already indicates a well-working team
Timeline Lack of timeline. Or the timeline is completely unrealistic The timeline feels incomplete and perfunctory. The timeline feels either too fast or too slow for the progress you expect a group can make It feels like the timeline is complete and appropriate. it can likely be completed as is in the available amount of time The timeline was clearly the subject of a thoughtful approach and indicates that the team has a detailed plan that seems appropriate and completable in the allotted time.

Scoring: Out of 9 points

If students address the detailed feedback in a future checkpoint, they will earn these points back.

Comments

In the background part you discuss about how real estate market is largely influenced by economy and government policies, but neither of these two are used in the solution you proposed. Therefore I suggest not to put your emphasis on these two factors in your background section, but discuss more about why you are investigating only demographic information. However, you can mention the impact of these two factors in result analysis.

Another concern is the dataset you pick is a bit too huge. Usually more data yield more accurate and meaningful result, but as the dataset grows extremely large it imposes a lot of technical challenges for even very simple model. Maybe you can consider reducing the scope of the problem, say focusing on only real estate in a city or even a district (but explain why you pick this or that city/district)?

arth-shukla commented 1 year ago

Hi @scott-yj-yang ,

Thanks for the feedback!

To address the issues with our background and also the size of our data, we decided to focus on Moscow. Moscow is the capital of Russia and is a hot real estate market. It also cuts down our sample to about 1/10th the size of the original dataset.

We also write about sampling a random subset from the Moscow data, since this would avoid breaking assumptions for models like OLS while reducing our computation time complexity.

Do you think these updates are a good way to address the problems you mentioned?


In addition, we've performed EDA and are starting to implement models. Regarding this, we have two main qs:

  1. is the checkpoint due tomorrow (weds)?
  2. since sklearn doesn't support gpu acceleration, we were looking at RAPIDS, a package built on sklearn which allows for cuda acceleration. Do you think it's a good idea to pursue this, both to more efficiently train our models, as well as to show 'above and beyond' commitment/work?

Thank you in advance for your advice!

eleeeysh commented 1 year ago

Background +0.75; Data +0.75

eleeeysh commented 1 year ago

oh so sorry I didn't see your comment last week. Yes, a smaller dataset sampling from Moscow's data definitely sounds much better. Feel free to try any techniques to accelerate training and testing!