chriskuz / intro_to_python_project

1 stars 0 forks source link

summarizing findings for cleanup and observations of data. #11

Open chriskuz opened 1 week ago

chriskuz commented 1 week ago

Going to be adding highlights to this thread periodically for tracking better some of the work that went into understanding the data.

chriskuz commented 1 week ago

Nulls:

chriskuz commented 1 week ago

Error labeling segmentsEquipmentDescription:

There exists an idea where a more consistent column that correctly uses a || delimiter could help us fill in the gaps on any missing information for this. However, this would likely mean we would need smart use of regex which is really annoying.

chriskuz commented 1 week ago

Duplicated Features (segmentsAirlineCode, segmentsAirlineName):

It is likely best for us to consider the removal of the impure routes which will shorten the table a little bit. Also, since this will create pure JetBlue routes, we can consider the utter removal of this column as well as the airline code column. The filter must happen first based on these helper columns before a consideration of removing the columns. There might be a case to keep at least one of these columns to take advantage of the delimiters which may help a model understand the price correlation with multi leg routes. No resolution found for now.