DataUSA / datausa-tracker

0 stars 0 forks source link

missing industry in BLS tables #219

Closed davelandry closed 4 years ago

davelandry commented 5 years ago

I'm trying to find industry growth for Retail Trade (44-45), which is seen in the top right stat on the old Data USA here, but it's not available in the cubes: https://saguaro-api.datausa.io/ui/#eyJkcmlsbERvd25zIjpbWyJCTFMgSW5kdXN0cnkgRmxhdCIsIkJMUyBJbmR1c3RyeSBGbGF0IiwiSW5kdXN0cnkiXV0sImN1dHMiOltbWyI0NC00NSJdLFsiQkxTIEluZHVzdHJ5IEZsYXQiLCJCTFMgSW5kdXN0cnkgRmxhdCIsIkluZHVzdHJ5Il1dXSwiY3ViZSI6ImJsc19ncm93dGhfaW5kdXN0cnkiLCJtZWFzdXJlcyI6WyJJbmR1c3RyeSBKb2JzIFRob3VzYW5kcyAyMDA2IiwiSW5kdXN0cnkgSm9icyBUaG91c2FuZHMgMjAxNiIsIkluZHVzdHJ5IEpvYnMgVGhvdXNhbmRzIDIwMjYiXX0=

jspeis commented 5 years ago

In the meantime I've manually hacked in a temporary fix on theodore by directly editing the data in the table but of course @hwchen we would want to fix this in the ETL

hwchen commented 5 years ago

I'll fix this when I get back; unfortunately the bls code is at the office. But I have some idea what the issue is. If I'm remembering correctly, some of the ids were split on commas, and some were dashed.

hwchen commented 5 years ago

(and to some of the slack discussion: I don't think it's a crosswalk thing, I think it's just how i'm processing multiple ids as one string, per row)

davelandry commented 5 years ago

As another test case, i noticed the missing growth for Labor Unions as well: https://theodore.datausa.io/profile/naics/81393

hwchen commented 5 years ago

This should be fixed tomorrow, sorry for delay

On Mon, Feb 18, 2019, 2:45 PM Dave Landry <notifications@github.com wrote:

As another test case, i noticed the missing growth for Labor Unions as well: https://theodore.datausa.io/profile/naics/81393

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Datawheel/datausa-tracker/issues/219#issuecomment-464909987, or mute the thread https://github.com/notifications/unsubscribe-auth/ADm0i_xhHP41IQqi-KKH2JPxnB792XxIks5vOy0ggaJpZM4akLyZ .

hwchen commented 5 years ago

Looks like the issue isn't in etl errors either (although I do have another bug to fix)

Retail trade is normally 44-45, but in the table it's represented as 44, 45. I'm not crosswalking, so I'll just include a special case. (The table is already using naics, so I wanted to minimize the amount of transformation like crosswalking.)

I'll leave this thread open for a bit, so you can keep adding special cases.

hwchen commented 5 years ago

As for Labor Unions 81393, it doesn't exist in the table (industry growth). The closest is Civic, social, professional, and similar organizations "8134, 8139".

@jspeis do you know how you handled this in the past? Perhaps I'm using an entirely different table than you did?

jspeis commented 5 years ago

The old Data USA used a substitution of "Other Services, Except Public Administration."

The easiest way to check is to see the Job growth section: https://datausa.io/profile/naics/81393/#job_growth

@davelandry I think it should be considered a bug though that this information is presented in the splash without indication that a substitution has occurred

hwchen commented 5 years ago

Ok, Other Services, Except Public Administration exists as Other services in the growth table. I'll hardcode this into the etl (copying this row and substituting the labor unions id).

jspeis commented 5 years ago

@hwchen sorry for the confusion --- we want to preserve that as "Other services" in the data but we'll just want to make sure the crosswalk in the logic layer has the correct mapping (which it should already have)

jspeis commented 5 years ago

(the reason we want to preserve that is so that we can notify users that a crosswalk is occurring)

hwchen commented 5 years ago

ok, I think that makes sense. So I should still hard-code in Retail as 44-45, since it's the same thing; but let the crosswalk handle Labor Unions.

jspeis commented 5 years ago

yup retail can get hard coded since its the same thing but we'll let the crosswalk handle labor unions

hwchen commented 5 years ago

On ulysses

fixed for retail trade. https://github.com/Datawheel/bls-core/commit/37e8fb57e166bf881c37eb7fc58518edd77c0bbe

Fixed some other possible issues with leading whitespace: https://github.com/Datawheel/bls-core/commit/ff34b779c526f6f4d754a9aaaa350bb3a2d377d1

hwchen commented 5 years ago

@davelandry let me know if there's anything needed from my end on the naics substitution crosswalk for labor union.

Labor Unions, 81393 -> 81 (Other services)

davelandry commented 5 years ago

@hwchen could you help me identify any PUMS Industry codes that currently don't have a crosswalk? Here's the list I'm working off of: https://github.com/DataUSA/datausa-site/blob/canon/static/data/pums_bls_industry_crosswalk.json

hwchen commented 5 years ago

This looks like pums -> bls crosswalk. Will we need a separate bls -> bls crosswalk to know that a substitution is made?

On Thu, Feb 21, 2019 at 5:56 PM Dave Landry notifications@github.com wrote:

@hwchen https://github.com/hwchen could you help me identify any PUMS Industry codes that currently don't have a crosswalk? Here's the list I'm working off of: https://github.com/DataUSA/datausa-site/blob/canon/static/data/pums_bls_industry_crosswalk.json

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Datawheel/datausa-tracker/issues/219#issuecomment-466202152, or mute the thread https://github.com/notifications/unsubscribe-auth/ADm0i7AF894trM9IcGBY2h2NcCSHUguXks5vPyQngaJpZM4akLyZ .

davelandry commented 5 years ago

our logiclayer needs 1 crosswalk that contains entries for every PUMS to BLS pairing that can be made...

davelandry commented 5 years ago

ahhh I see, my crosswalk has an entry for 81393, but it's mapping directly to 81393 in BLS (but I'm assuming there's just not data at that level?).

Should I change that entry to 81? I guess what I'm asking is: could you give me an updated crosswalk that maps PUMS to BLS codes containing data?

hwchen commented 5 years ago

1) The mappings are currently from one pums to an array of bls naics. Should I append to the end of the array, or replace it?

2) Looks like there's a two-step process:

I would probably ask victor to help out with this. Also pinging @jspeis to confirm that this is a reasonable procedure.

One worry is that there could be data for one bls table but not another (there's a ces and a growth), which could mess up having one crosswalk for both tables.

Some numbers for distinct naics: crosswalk: 306 entries for pums -> bls ces: 154 6-digit (in ulysses), 771 overall (not currently in ulysses, for some reason they were limited to 6-digit) growth: 244

(I haven't looked at the overlap yet)

Also, the substitutions in some cases are probably large enough that it should be noted. (for example, substituting "motor vehicle and parts dealers" (I'm guessing here) for "car washes" is a pretty big jump). I guess you can see this just from seeing how far the label is from the original pums naics?

Anyways, here's the full list of bls naics that are in the crosswalk but are not in the growth table:

Offices of optometrists  62132
Furniture and home furnishings stores  442
Printing and related support activities  3231
Veterinary services  54194
Grocery stores  4451
Lumber and other construction materials merchant wholesalers  4233
Professional and commercial equipment and supplies merchant wholesalers 4234
Household appliance stores  443141
Farm supplies merchant wholesalers  42491
Libraries and archives 51912
Clothing stores  4481
Carpet and rug mills 31411
Clay building material and refractories manufacturing 327120
Labor unions 81393
Agricultural implement manufacturing  33311
Recyclable material merchant wholesalers  42393
Used merchandise stores  4533
Newspaper publishers 51111
Grocery and related product merchant wholesalers  4244
Footwear manufacturing 3162
Tire manufacturing  32621
Office supplies and stationery stores 45321
Beauty salons  812112
Musical instrument and supplies stores  45114
Automotive parts, accessories, and tire stores   4413
Other direct selling establishments 45439
Hardware, and plumbing and heating equipment, and supplies merchant wholesalers 4237
Offices of chiropractors  62131
Retail florists 4531
Electronics stores 443142
Book stores and news dealers  45121
Petroleum refining  32411
Gift, novelty, and souvenir shops  45322
Fiber, yarn, and thread mills  3131
Furniture and home furnishing merchant wholesalers    4232
Other motor vehicle dealers  4412
Taxi and limousine service  4853
Farm product raw material merchant wholesalers  4245
Barber shops  812111
Fuel dealers  454310
Jewelry, luggage, and leather goods stores  4483
Electronic auctions    454112
Electronic shopping    454111
Landscaping services 56173
Miscellaneous general merchandise stores 4529
Textile and fabric finishing and coating mills  3133
Metals and minerals (except petroleum) merchant wholesalers                          4235
Household appliances and electrical and electronic goods merchant wholesalers  4236
Hardware stores  44413
Gasoline stations  447
Apparel accessories and other apparel manufacturing 3159
Pharmacies and drug stores  44611
Paper and paper products merchant wholesalers  4241
Apparel, piece goods, and notions merchant wholesalers  4243
Miscellaneous retail stores  4539
Data processing, hosting, and related services 5182
Petroleum and petroleum products merchant wholesalers  4247
Sound recording industries 5122
Machinery, equipment, and supplies merchant wholesalers      4238
Internet publishing and broadcasting and web search portals 51913
Retail bakeries  311811
Specialty food stores 4452
Video tape and disk rental  53223
Motor vehicle and motor vehicle parts and supplies merchant wholesalers        4231
Beer, wine, and liquor stores  4453
Lawn and garden equipment and supplies stores  4442
Motion pictures and video industries 5121
Bowling centers  71395
Pottery, ceramics, and plumbing fixture manufacturing   32711
Department stores and discount stores                                     45211
Car washes 811192
Alcoholic beverages merchant wholesalers  4248
Sewing, needlework, and piece goods stores  45113
Cut and sew apparel manufacturing 3152
Nursing care facilities (skilled nursing facilities) 6231
Mail-order houses 454113
Vending machine operators  4542
Traveler accommodation  7211
Automobile dealers  4411
Drinking places, alcoholic beverages  7224
Paperboard container manufacturing 32221
Shoe stores  44821
Transportation and Warehousing      48,492,493
Public Administration 92
Fabric mills, except knitting mills 3132
Prefabricated wood buildings and mobile homes  32199
Miscellaneous petroleum and coal products  3241
Ordnance  33299
Drugs, sundries, and chemical and allied products merchant  wholesalers  424
Public finance activities  92113
Health and personal care, except drug, stores  446
Civic, social, advocacy organizations, and grantmaking and giving services 813
Sporting goods, and hobby and toy stores  4511
Administration of human resource programs   923
Justice, public order, and safety activities  92
Transportation and Warehousing, and Utilities      48,492,493
Miscellaneous nondurable goods merchant wholesalers  4249
Miscellaneous durable goods merchant wholesalers     4239
Aircraft and parts manufacturing 33641
Aerospace products and parts manufacturing  33641
Business, technical, and trade schools and training  611
Colleges, universities, and professional schools, including junior colleges  611
Other schools and instruction, and educational support services  611
Administration of economic programs and space research 92
Administration of environmental quality and housing programs  92
Executive offices and legislative bodies  9211
Military Reserves or National Guard 928110
U. S Coast Guard 928110
U. S. Army 928110
U. S. Navy 928110
U. S. Marines 928110
National security and international affairs  928110
Active Duty Military 928110
U. S. Armed Forces, Branch Not Specified 928110
U. S. Air Force 928110
Savings institutions, including credit unions  5221
Building material and supplies dealers   4441
Other information services, except libraries and archives, and internet publishing and broadcasting and web search portals 5191
Other general government and support  92119
Wholesale electronic markets and agents and brokers  425
hwchen commented 5 years ago

I'll be guiding victor through this tomorrow, so we're moving forward on this.

hwchen commented 5 years ago

quick update: I've briefed victor on this, but looks like he has some other stuff in the pipeline too from dave. so I'll leave it to dave to set the priority on this.