ClimbsRocks / data-formatter

Takes raw csv input and formats it to be ready for neural networks
19 stars 7 forks source link

handle dates #34

Open ClimbsRocks opened 9 years ago

ClimbsRocks commented 9 years ago

we should A) create a category for "dayOfWeek" that is just "M, T, W", etc. There has to be some library or module that does this for us easily

B) create a category for weekend or weekday?

C) maybe create a new column that is "daysSinceFirstDay". we'd transform the first day in the dataset to a 0, and then count upwards from there. that would let us see any time series info that might be going on.

D) post-MVP, when we want to get fancy in the future, we could also take in a flag for country (either a flag in the initial dataset, or, if we have many countries represented here, create a new column called countryForHolidayCalculating in it's dataDescription row, and use that for each row specifically), then create a categorical column that specifies what holidays occurred on that date, if any. we could then let feature selection figure out for us if Indigenous Peoples Day is an important holiday for this dataset or not. We could also, ourselves, create another column for isFederalHoliday or isHolidayButNotFederal.

E) leave in the original date as a category, and let feature selection prune away all but the important ones. For example, Black Friday or Cyber Monday, generally not considered holidays, but likely do have an impact on outcomes for some datasets.

F) transform into month, week, and year columns as well. again, let feature selection drop the less-useful features.