alexeygrigorev / data-science-interviews

Data science interview questions and answers
https://ds-interviews.org
Creative Commons Attribution 4.0 International
8.74k stars 1.95k forks source link

Suggest answer for theoretical question #168

Open junjunbaby123 opened 7 months ago

junjunbaby123 commented 7 months ago
Q: What if we want to build a model for predicting prices? Are prices distributed normally? Do we need to do any pre-processing for prices?

A: None of the models require independent or dependent variables follow normality assumption. The normality assumption is always on the error terms. We assume that after fitting the model, the error term are i.i.d. N(0, sigma2). 

Prediction of price could either be a time series forecast or cross-sectional forecast. In a time series forecast, we need to pre-process the price so that the time series is stationary. We can detect stationarity by using the ADF test. If it is not stationary, we use first-difference, second difference, ..., until the series is stationary. Sometimes we can use the log of price to resemble growth rate. Under the cross secitonal forecast, we can apply multivariate regression to predict price (such as house price forecast). Here we can also use the log of price, and we also need to engineer features and select features.
Harshupadhyay221 commented 5 months ago

When building a model for predicting prices, it's important to consider several factors regarding the distribution of prices and any necessary preprocessing steps: Distribution of Prices: Preprocessing for Prices: Feature Engineering: Evaluation Metrics: