Closed davelandry closed 4 years ago
In the meantime I've manually hacked in a temporary fix on theodore by directly editing the data in the table but of course @hwchen we would want to fix this in the ETL
I'll fix this when I get back; unfortunately the bls code is at the office. But I have some idea what the issue is. If I'm remembering correctly, some of the ids were split on commas, and some were dashed.
(and to some of the slack discussion: I don't think it's a crosswalk thing, I think it's just how i'm processing multiple ids as one string, per row)
As another test case, i noticed the missing growth for Labor Unions as well: https://theodore.datausa.io/profile/naics/81393
This should be fixed tomorrow, sorry for delay
On Mon, Feb 18, 2019, 2:45 PM Dave Landry <notifications@github.com wrote:
As another test case, i noticed the missing growth for Labor Unions as well: https://theodore.datausa.io/profile/naics/81393
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Datawheel/datausa-tracker/issues/219#issuecomment-464909987, or mute the thread https://github.com/notifications/unsubscribe-auth/ADm0i_xhHP41IQqi-KKH2JPxnB792XxIks5vOy0ggaJpZM4akLyZ .
Looks like the issue isn't in etl errors either (although I do have another bug to fix)
Retail trade is normally 44-45
, but in the table it's represented as 44, 45
. I'm not crosswalking, so I'll just include a special case. (The table is already using naics, so I wanted to minimize the amount of transformation like crosswalking.)
I'll leave this thread open for a bit, so you can keep adding special cases.
As for Labor Unions 81393
, it doesn't exist in the table (industry growth). The closest is Civic, social, professional, and similar organizations
"8134, 8139"
.
@jspeis do you know how you handled this in the past? Perhaps I'm using an entirely different table than you did?
The old Data USA used a substitution of "Other Services, Except Public Administration."
The easiest way to check is to see the Job growth section: https://datausa.io/profile/naics/81393/#job_growth
@davelandry I think it should be considered a bug though that this information is presented in the splash without indication that a substitution has occurred
Ok, Other Services, Except Public Administration
exists as Other services
in the growth table. I'll hardcode this into the etl (copying this row and substituting the labor unions id).
@hwchen sorry for the confusion --- we want to preserve that as "Other services" in the data but we'll just want to make sure the crosswalk in the logic layer has the correct mapping (which it should already have)
(the reason we want to preserve that is so that we can notify users that a crosswalk is occurring)
ok, I think that makes sense. So I should still hard-code in Retail as 44-45
, since it's the same thing; but let the crosswalk handle Labor Unions.
yup retail can get hard coded since its the same thing but we'll let the crosswalk handle labor unions
On ulysses
fixed for retail trade. https://github.com/Datawheel/bls-core/commit/37e8fb57e166bf881c37eb7fc58518edd77c0bbe
Fixed some other possible issues with leading whitespace: https://github.com/Datawheel/bls-core/commit/ff34b779c526f6f4d754a9aaaa350bb3a2d377d1
@davelandry let me know if there's anything needed from my end on the naics substitution crosswalk for labor union.
Labor Unions, 81393 -> 81 (Other services)
@hwchen could you help me identify any PUMS Industry codes that currently don't have a crosswalk? Here's the list I'm working off of: https://github.com/DataUSA/datausa-site/blob/canon/static/data/pums_bls_industry_crosswalk.json
This looks like pums -> bls crosswalk. Will we need a separate bls -> bls crosswalk to know that a substitution is made?
On Thu, Feb 21, 2019 at 5:56 PM Dave Landry notifications@github.com wrote:
@hwchen https://github.com/hwchen could you help me identify any PUMS Industry codes that currently don't have a crosswalk? Here's the list I'm working off of: https://github.com/DataUSA/datausa-site/blob/canon/static/data/pums_bls_industry_crosswalk.json
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Datawheel/datausa-tracker/issues/219#issuecomment-466202152, or mute the thread https://github.com/notifications/unsubscribe-auth/ADm0i7AF894trM9IcGBY2h2NcCSHUguXks5vPyQngaJpZM4akLyZ .
our logiclayer needs 1 crosswalk that contains entries for every PUMS to BLS pairing that can be made...
ahhh I see, my crosswalk has an entry for 81393
, but it's mapping directly to 81393
in BLS (but I'm assuming there's just not data at that level?).
Should I change that entry to 81
? I guess what I'm asking is: could you give me an updated crosswalk that maps PUMS to BLS codes containing data?
1) The mappings are currently from one pums to an array of bls naics. Should I append to the end of the array, or replace it?
2) Looks like there's a two-step process:
Labor Unions
, it mapped to the bls Other Services, Except Public Administration
. I didn't see this in the growth table, the closest was Other services
).I would probably ask victor to help out with this. Also pinging @jspeis to confirm that this is a reasonable procedure.
One worry is that there could be data for one bls table but not another (there's a ces
and a growth
), which could mess up having one crosswalk for both tables.
Some numbers for distinct naics: crosswalk: 306 entries for pums -> bls ces: 154 6-digit (in ulysses), 771 overall (not currently in ulysses, for some reason they were limited to 6-digit) growth: 244
(I haven't looked at the overlap yet)
Also, the substitutions in some cases are probably large enough that it should be noted. (for example, substituting "motor vehicle and parts dealers" (I'm guessing here) for "car washes" is a pretty big jump). I guess you can see this just from seeing how far the label is from the original pums naics?
Anyways, here's the full list of bls naics that are in the crosswalk but are not in the growth table:
Offices of optometrists 62132
Furniture and home furnishings stores 442
Printing and related support activities 3231
Veterinary services 54194
Grocery stores 4451
Lumber and other construction materials merchant wholesalers 4233
Professional and commercial equipment and supplies merchant wholesalers 4234
Household appliance stores 443141
Farm supplies merchant wholesalers 42491
Libraries and archives 51912
Clothing stores 4481
Carpet and rug mills 31411
Clay building material and refractories manufacturing 327120
Labor unions 81393
Agricultural implement manufacturing 33311
Recyclable material merchant wholesalers 42393
Used merchandise stores 4533
Newspaper publishers 51111
Grocery and related product merchant wholesalers 4244
Footwear manufacturing 3162
Tire manufacturing 32621
Office supplies and stationery stores 45321
Beauty salons 812112
Musical instrument and supplies stores 45114
Automotive parts, accessories, and tire stores 4413
Other direct selling establishments 45439
Hardware, and plumbing and heating equipment, and supplies merchant wholesalers 4237
Offices of chiropractors 62131
Retail florists 4531
Electronics stores 443142
Book stores and news dealers 45121
Petroleum refining 32411
Gift, novelty, and souvenir shops 45322
Fiber, yarn, and thread mills 3131
Furniture and home furnishing merchant wholesalers 4232
Other motor vehicle dealers 4412
Taxi and limousine service 4853
Farm product raw material merchant wholesalers 4245
Barber shops 812111
Fuel dealers 454310
Jewelry, luggage, and leather goods stores 4483
Electronic auctions 454112
Electronic shopping 454111
Landscaping services 56173
Miscellaneous general merchandise stores 4529
Textile and fabric finishing and coating mills 3133
Metals and minerals (except petroleum) merchant wholesalers 4235
Household appliances and electrical and electronic goods merchant wholesalers 4236
Hardware stores 44413
Gasoline stations 447
Apparel accessories and other apparel manufacturing 3159
Pharmacies and drug stores 44611
Paper and paper products merchant wholesalers 4241
Apparel, piece goods, and notions merchant wholesalers 4243
Miscellaneous retail stores 4539
Data processing, hosting, and related services 5182
Petroleum and petroleum products merchant wholesalers 4247
Sound recording industries 5122
Machinery, equipment, and supplies merchant wholesalers 4238
Internet publishing and broadcasting and web search portals 51913
Retail bakeries 311811
Specialty food stores 4452
Video tape and disk rental 53223
Motor vehicle and motor vehicle parts and supplies merchant wholesalers 4231
Beer, wine, and liquor stores 4453
Lawn and garden equipment and supplies stores 4442
Motion pictures and video industries 5121
Bowling centers 71395
Pottery, ceramics, and plumbing fixture manufacturing 32711
Department stores and discount stores 45211
Car washes 811192
Alcoholic beverages merchant wholesalers 4248
Sewing, needlework, and piece goods stores 45113
Cut and sew apparel manufacturing 3152
Nursing care facilities (skilled nursing facilities) 6231
Mail-order houses 454113
Vending machine operators 4542
Traveler accommodation 7211
Automobile dealers 4411
Drinking places, alcoholic beverages 7224
Paperboard container manufacturing 32221
Shoe stores 44821
Transportation and Warehousing 48,492,493
Public Administration 92
Fabric mills, except knitting mills 3132
Prefabricated wood buildings and mobile homes 32199
Miscellaneous petroleum and coal products 3241
Ordnance 33299
Drugs, sundries, and chemical and allied products merchant wholesalers 424
Public finance activities 92113
Health and personal care, except drug, stores 446
Civic, social, advocacy organizations, and grantmaking and giving services 813
Sporting goods, and hobby and toy stores 4511
Administration of human resource programs 923
Justice, public order, and safety activities 92
Transportation and Warehousing, and Utilities 48,492,493
Miscellaneous nondurable goods merchant wholesalers 4249
Miscellaneous durable goods merchant wholesalers 4239
Aircraft and parts manufacturing 33641
Aerospace products and parts manufacturing 33641
Business, technical, and trade schools and training 611
Colleges, universities, and professional schools, including junior colleges 611
Other schools and instruction, and educational support services 611
Administration of economic programs and space research 92
Administration of environmental quality and housing programs 92
Executive offices and legislative bodies 9211
Military Reserves or National Guard 928110
U. S Coast Guard 928110
U. S. Army 928110
U. S. Navy 928110
U. S. Marines 928110
National security and international affairs 928110
Active Duty Military 928110
U. S. Armed Forces, Branch Not Specified 928110
U. S. Air Force 928110
Savings institutions, including credit unions 5221
Building material and supplies dealers 4441
Other information services, except libraries and archives, and internet publishing and broadcasting and web search portals 5191
Other general government and support 92119
Wholesale electronic markets and agents and brokers 425
I'll be guiding victor through this tomorrow, so we're moving forward on this.
quick update: I've briefed victor on this, but looks like he has some other stuff in the pipeline too from dave. so I'll leave it to dave to set the priority on this.
I'm trying to find industry growth for Retail Trade (
44-45
), which is seen in the top right stat on the old Data USA here, but it's not available in the cubes: https://saguaro-api.datausa.io/ui/#eyJkcmlsbERvd25zIjpbWyJCTFMgSW5kdXN0cnkgRmxhdCIsIkJMUyBJbmR1c3RyeSBGbGF0IiwiSW5kdXN0cnkiXV0sImN1dHMiOltbWyI0NC00NSJdLFsiQkxTIEluZHVzdHJ5IEZsYXQiLCJCTFMgSW5kdXN0cnkgRmxhdCIsIkluZHVzdHJ5Il1dXSwiY3ViZSI6ImJsc19ncm93dGhfaW5kdXN0cnkiLCJtZWFzdXJlcyI6WyJJbmR1c3RyeSBKb2JzIFRob3VzYW5kcyAyMDA2IiwiSW5kdXN0cnkgSm9icyBUaG91c2FuZHMgMjAxNiIsIkluZHVzdHJ5IEpvYnMgVGhvdXNhbmRzIDIwMjYiXX0=