Closed davelandry closed 5 years ago
@hwchen would you take a look into what's going on with this cube?
@davelandry you're saying that the query from mondrian-rest ui works but the mondrian-rest-client request does not?
I tried curling the query in the link and it worked ok. It seems to work in the ui. Is it still not working through mondrian-rest-client?
If it's still happening with mondrian-rest-client, it would be helpful to see the code, so that I can see if there's some issue in the client.
That error seems like something that happens on startup though, so not sure
ooooo it's your favorite @hwchen 😏
mondrian-rest-client config:
.measure("Occupation Employment Thousands 2016")
.drilldown("BLS Occupation Flat", "BLS Occupation Flat", "Occupation")
NilClass error:
Cube Error 500 Internal Server Error
Puma caught this error: undefined method `[]' for nil:NilClass (NoMethodError)
/home/deploy/mondrian-rest/lib/mondrian_rest/api_formatters.rb:104:in `block in pluck'
org/jruby/RubyArray.java:2486:in `map'
/home/deploy/mondrian-rest/lib/mondrian_rest/api_formatters.rb:104:in `pluck'
/home/deploy/mondrian-rest/lib/mondrian_rest/api_formatters.rb:63:in `block in tidy'
org/jruby/RubyArray.java:1734:in `each'
org/jruby/RubyEnumerable.java:1067:in `each_with_index'
/home/deploy/mondrian-rest/lib/mondrian_rest/api_formatters.rb:58:in `block in tidy'
org/jruby/RubyGenerator.java:99:in `each'
org/jruby/RubyEnumerator.java:323:in `each'
org/jruby/RubyEnumerator.java:329:in `each'
org/jruby/RubyEnumerator.java:323:in `each'
org/jruby/RubyEnumerator.java:501:in `with_index'
uri:classloader:/jruby/kernel/enumerator.rb:60:in `with_index'
org/jruby/RubyEnumerator.java:323:in `each'
org/jruby/RubyEnumerator.java:410:in `with_object'
/home/deploy/mondrian-rest/lib/mondrian_rest/formatters/jsonrecords.rb:27:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/formatter.rb:45:in `block in build_formatted_response'
org/jruby/RubyArray.java:2481:in `collect'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/formatter.rb:45:in `block in build_formatted_response'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/activesupport-5.2.1/lib/active_support/notifications.rb:170:in `instrument'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/formatter.rb:44:in `build_formatted_response'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/formatter.rb:28:in `after'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/base.rb:34:in `call!'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/base.rb:24:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/base.rb:31:in `call!'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/base.rb:24:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/error.rb:38:in `block in call!'
org/jruby/RubyKernel.java:1114:in `catch'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/error.rb:37:in `call!'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/middleware/base.rb:24:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/rack-2.0.5/lib/rack/head.rb:12:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/endpoint.rb:227:in `call!'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/endpoint.rb:221:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router/route.rb:72:in `exec'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router.rb:121:in `process_route'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router.rb:74:in `block in identity'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router.rb:93:in `transaction'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router.rb:72:in `identity'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router.rb:57:in `block in call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router.rb:137:in `with_optimization'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/router.rb:56:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/api.rb:119:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/api.rb:45:in `call!'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/grape-1.1.0/lib/grape/api.rb:40:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/rack-2.0.5/lib/rack/common_logger.rb:33:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/rack-cors-0.4.1/lib/rack/cors.rb:81:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/rack-2.0.5/lib/rack/config.rb:17:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/puma-3.8.2-java/lib/puma/configuration.rb:224:in `call'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/puma-3.8.2-java/lib/puma/server.rb:600:in `handle_request'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/puma-3.8.2-java/lib/puma/server.rb:435:in `process_client'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/puma-3.8.2-java/lib/puma/server.rb:299:in `block in run'
/home/deploy/.rvm/gems/jruby-9.1.13.0/gems/puma-3.8.2-java/lib/puma/thread_pool.rb:120:in `block in spawn_thread'
I see the issue; I'll fix, hopefully by the end of the week. It requires me to dive a little into the etl.
@jspeis @davelandry I want to double check on how the cube is intended to be used. I think that the Pums dimension is clean, and I had expected it to be used for the bulk of the queries to the cube. Can you remind me what the BLS drilldown is used for?
To give an example of the issue:
pums_soc,name,bls_soc
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553011
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553012
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553013
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553014
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553015
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553016
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553017
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553018
553010,Military enlisted tactical operations and air/weapons specialists and crew members,553019
for this label, one pums code maps to many bls codes.
Both pums_soc
and bls_soc
are used as foreign keys in the fact table.
I guess there's one more issue, which I'll have to look into more closely: why this kind of many-to-one exists:
alpaca:bls-core (master|✔) > ag 330000
pums_to_bls_final.csv
443:"330000","Protective Service Occupations:","330000"
462:"559830","Military, rank not specified","330000"
465:"553010","Military enlisted tactical operations and air/weapons specialists and crew members","330000"
473:"552010","Firstline enlisted military supervisors","330000"
This is the reverse of above, where it looks like one bls maps to many pums.
@jspeis Any feedback is welcome, while I try to figure out what to do. I think that case 1 is more problematic for the cube, because I think the crosswalking is correct but mondrian doesn't like the format. The 2nd case might be a mistake or something else, not sure yet.
@hwchen the primary use case for this data is just to retrieve the relevant data from BLS growth estimates for an occupation profile page. (or industry for the BLS industry growth, which would have similar issues I believe). So we would want to take a PUMS SOC code, cut the cube by the related PUMS SOC but while also retaining the original BLS code for reference.
Because of issues that Mondrian faces with many-to-many mappings, something we've done in other instances (eg iocodes for BEA) is that we simply keep the data sources in their native formats and then provide the logic layer with a simply dictionary the maps codes. In this case would just be PUMS SOC
-> BLS SOC
) would that alleviate the issues you're seeing @hwchen?
I think that would alleviate some. This means, though, that the logic layer will need to manually construct the cuts by PUMS SOC?
As for the second case from above, I think that I constructed the BLS dims from the crosswalk.
Wait... I have another idea: Do we ever need the bls code directly associated to a label? I can just make the "name" col for the bls the same as the id. Then as long as the high level query is done by PUMS, and then the info is retrieved by BLS, I think this would work.
I think this works because I already did the crosswalk in the fact table.
We would need to have the BLS name appear somewhere because that is the 'real' name of the data record and we would want to be able to present the original name of the row in the data to users (to show its not exact match to the PUMS name)
Then I guess that a pums -> bls map in the logic layer is the solution. I'll get together a json file, then?
sounds good. and I assume we'll need to do this for both industry and occupation tables?
I was wrong about the cause of the original issue. There's something wrong in the etl or the data which is causing duplicated rows. I'll solve that and then see if we need to do the other mappings.
Well, I was wrong about being wrong. It's still the crosswalk.
Anyways, @davelandry and @jspeis here's and example of the crosswalk from pums to bls. As you can see, it's an object keyed on the pums code. The values are an array of bls codes that it maps to.
Let me know if this basic structure works for you.
alpaca:bls-core (master|✚2…) > cat out/pums_bls_occupation_crosswalk.json | jq '.["537000"]'
[
{
"name": "Material Moving Occupations:",
"bls_code": 537000
}
]
alpaca:bls-core (master|✚2…) > cat out/pums_bls_occupation_crosswalk.json | jq '.["5370XX"]'
[
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537000
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537011
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537031
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537032
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537033
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537041
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537071
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537072
},
{
"name": "Conveyor operators and tenders, and hoist and winch operators",
"bls_code": 537073
}
]
@davelandry @jspeis even with the crosswalk in the logic layer (which will fix my incorrect merge of the crosswalk into the fact table), there's still the issue of repeated labels in the bls dimension, which I believe will still cause the pluck error.
I think that this is in part from trying to reconcile pums and bls; I think the labels were taken from the crosswalk. So, I have probably two options:
I would very much like tesseract to handle crosswalks gracefully.
alpaca:out (master|✚2…) > ag 5370 dim_flat_bls_occupation.csv
58:Refuse and recyclable material collectors,537081
79:Industrial truck and tractor operators,537051
149:"Laborers and freight, stock, and material movers, hand",537062
205:"Packers and packagers, hand",537064
240:"Dredge, excavating, and loading machine operators",537030
330:Cleaners of vehicles and equipment,537061
342:Machine feeders and offbearers,537063
411:Pumping station operators,537070
474:"Conveyor operators and tenders, and hoist and winch operators",537000
481:Material Moving Occupations:,537000
489:Crane and tower operators,537021
978:"Conveyor operators and tenders, and hoist and winch operators",537011
979:"Conveyor operators and tenders, and hoist and winch operators",537031
980:"Conveyor operators and tenders, and hoist and winch operators",537032
981:"Conveyor operators and tenders, and hoist and winch operators",537033
982:"Conveyor operators and tenders, and hoist and winch operators",537041
983:"Conveyor operators and tenders, and hoist and winch operators",537071
984:"Conveyor operators and tenders, and hoist and winch operators",537072
985:"Conveyor operators and tenders, and hoist and winch operators",537073
Just spoke with @hwchen. The solution that we agreed on (and what matches how we crosswalk other tables currently) is:
Fixed by https://github.com/Datawheel/bls-core/commit/e33e8e226e033e49253e254a00a09fbbe3f589ec
fact table no longer does crosswalk in etl. This is live in lassen
I've provided the json crosswalk to @davelandry by slack. I'm not committing those objects, they have to be produced by running the bls core script.
Not sure if this is @hwchen or @jspeis, but when trying to replicate this query in the logic layer using mondrian-rest-client we get this error: