barbh1307 / merge_DLA_LESO_data

Merge the data from the DLA LESO Public Information (https://www.dla.mil/DispositionServices/Offers/Reutilization/LawEnforcement/PublicInformation/) site to build a dataset so the information can be monitored and compared from quarter to quarter.
GNU General Public License v3.0
0 stars 0 forks source link

build a dataset that integrates better with other sources of data #3

Open barbh1307 opened 3 years ago

barbh1307 commented 3 years ago
barbh1307 commented 3 years ago

is kaggle an option?

barbh1307 commented 3 years ago

see https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d

QUESTION: which columns are categorical? NOTE: none of these are evenly distributed OriginatingFile: categories will grow each month, so dates will change off these three files

DISP_AllStatesAndTerritories_mmddyyyy.xlsx[ALL]
DISP_Shipments_Territories_mmddyyyy_mmddyyyy.xlsx[SHIPMENTS]
DISP_Shipments_Territories_mmddyyyy_mmddyyyy.xlsx[CANCELLATIONS]

StateAbbreviation: categorized by state; there are 59 possibilities based on

US Postal Service Publication 28 https://pe.usps.com/text/pub28/28apb.htm
(see Check_DISP_AllStatesAndTerritories.ipynb and Check_DISP_Shipments_Cancellations.ipynb

Item_FSG: categorized by federal supply group; there are 100 possibilites, not evenly distributed

see https://en.wikipedia.org/wiki/Federal_Stock_Number#External_links
and https://en.wikipedia.org/wiki/List_of_NATO_Supply_Classification_Groups to start tracking these down

AgencyType: rough guess of types of agencies based on station names, currently 13 categories

TO EXPLORE: feather/parquet? should we save with more catgories?

barbh1307 commented 3 years ago

Two possible sources for ORI data:

barbh1307 commented 3 years ago

build a new repo called create_DLA_LESO_dataset for this work