Open barbh1307 opened 3 years ago
is kaggle an option?
see https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d
QUESTION: which columns are categorical? NOTE: none of these are evenly distributed OriginatingFile: categories will grow each month, so dates will change off these three files
DISP_AllStatesAndTerritories_mmddyyyy.xlsx[ALL]
DISP_Shipments_Territories_mmddyyyy_mmddyyyy.xlsx[SHIPMENTS]
DISP_Shipments_Territories_mmddyyyy_mmddyyyy.xlsx[CANCELLATIONS]
StateAbbreviation: categorized by state; there are 59 possibilities based on
US Postal Service Publication 28 https://pe.usps.com/text/pub28/28apb.htm
(see Check_DISP_AllStatesAndTerritories.ipynb and Check_DISP_Shipments_Cancellations.ipynb
Item_FSG: categorized by federal supply group; there are 100 possibilites, not evenly distributed
see https://en.wikipedia.org/wiki/Federal_Stock_Number#External_links
and https://en.wikipedia.org/wiki/List_of_NATO_Supply_Classification_Groups to start tracking these down
AgencyType: rough guess of types of agencies based on station names, currently 13 categories
TO EXPLORE: feather/parquet? should we save with more catgories?
Two possible sources for ORI data:
build a new repo called create_DLA_LESO_dataset for this work