Census layer: establish color schemes, classifications

Language-Mapping / language-map

Front-end codebase for Language Mapping web map

https://languagemap.nyc

MIT License

6 stars 4 forks source link

Census layer: establish color schemes, classifications #115

Closed abettermap closed 3 years ago

abettermap commented 4 years ago

Summary

Establish polygon color scheme/s for the census layer/s. Maya has some experience with this via an ArcGIS web map, so let's start from there.

Questions

Will the symbology be based on a verbatim value (e.g. number of speakers) or an on-the-fly calc like percentage (e.g. number of speakers out of total number of speakers)?
Which classification to use? The JS color lib will likely be chroma.js, which supports four class modes, so if at all possible it seems logical to shoot for one of those in the ArcGIS web map:

It supports the modes equidistant (e), quantile (q), logarithmic (l), and k-means (k).
Can we use the same classification across the board, as in all languages and all granularities (tract and PUMA), or will that not sufficiently communicate patterns in the data?

Strategies

In the ArcGIS web map, use the light basemap similar to our MB one. This should increase the chances that the colors will look good in the final map.
If it helps, this is the full example that the final code approach will likely be based on. It includes Chroma, MB, and the Census SDK.

abettermap commented 4 years ago

@fiddleHeads let me know if any of this doesn't make sense and/or you don't want to do it. I think there will be a lot of questions that come out of it due to a variety of scenarios, so let's just take it one step at a time.

abettermap commented 4 years ago

Also no hurry as I probably won't touch the census stuff for a bit!

fiddleHeads commented 4 years ago

Sounds good. Will let you know what I come up with.

abettermap commented 3 years ago

@fiddleHeads @rperlin-ela

this issue kind of got sucked into #113 (probably my bad there) so we can refer to any scheme-related details in that one but continue the scheme convo here.

Demo?

i pushed up a functioning but very-WIP deploy using:

census api perma-joined to mb-lookup
MB Boundaries tracts layer

Shhhh

i'm aware:

lang comm dot points don't show "above" tracts
color scheme classes: not super-meaningful
no PUMA
no legend
"rate of change" not intuitive. mostly just to demo the ease of MB's interpolate options

that list could get huge so use your judgment on what you feel is "ready" to comment on (we know the drill by now!).

Questions

Some overlap mentioned in #135 already, so i guess my question is just am i on the right track with this? the census goals have been reeeeeally broad so far and i'm struggling with what to communicate and offer to the user in the UI.

abettermap commented 3 years ago

also, and not to open another can of stats worms, but i installed the deceptively named simple statistics lib, so we'll need to answer the relevant things from above and #135 before i know what to do with that.

rperlin-ela commented 3 years ago

am i on the right track with this? the census goals have been reeeeeally broad so far and i'm struggling with what to communicate and offer to the user in the UI.

I apologize for the broadness but I think this is really on track. This is really the dream right here— showing people via ELA data that Chinatown, where the census records "Chinese", is much more diverse

Screen Shot 2020-11-17 at 12 08 32 PM (2)

So now I just want to make sure people see this, and that it's intuitive to get to this place. I'm not totally sure how to do that, but hopefully the way we ultimately present the "Spatial" panel will help, and fwiw I'll include something in Help. my question is whether there might be one or two more places in the UI where we can nudge people, like (with this example) in "China" under Categories or under "Chinese" in details.

that list could get huge so use your judgment on what you feel is "ready" to comment on (we know the drill by now!).

maybe I should break this out as a separate issue, or it's something you're aware of, but when I zoom in or out and then try to move my way around the map, it now seems to go back out to that default zoom level right away

also curious if you tried "Age5p_Only_English_ACS_13_17"

abettermap commented 3 years ago

I apologize for the broadness but I think this is really on track.

no worries, it's a broad topic, but good to know i'm on the right track!

one or two more places in the UI where we can nudge people, like (with this example) in "China" under Categories

typo perhaps? Countries i think? if so, possibly, but the config for this would have to live somewhere, and same for anything that's not in the data already, e.g. allowing user to set the census language at /Explore/Language/Chinese (Mandarin) to Chinese. I could simply do a check (at that particular Explore level anyway) to see if "Chinese (Mandarin)" includes any of the languages available in the Spatial "Show by" dropdown. this sounds way too easy though, and I doubt they'd fall so neatly into the list!

if i'm wrong on that though, would you settle for these spots as the two UI places to set Census language?

/Explore/Language/LanguageName
/Details/ID (see below)

or under "Chinese" in details.

so a new badge/btn/link then, correct? that could work, even sans addl config, if we add a new column like "Census Language" or something. this would actually be far easier on my end (and probably yours) than the Explore-level since that needs more config.

you'd just need to make sure the value for those cells matches the column name in whatever tracts/PUMA layer/s we end up using. oh and actually if we can't get them into one layer, then we'd probably need another column like "Census source" or "Census layer" which would have values of:

puma
tract
nothing/empty

maybe I should break this out as a separate issue, or it's something you're aware of, but when I zoom in or out and then try to move my way around the map, it now seems to go back out to that default zoom level right away

i think i know what you're talking about but i can't replicate it at the moment. presumably has something to do with how i'm setting the symbology for the tracts. are you seeing it consistently and in all scenarios or just after playing with the Spatial Show by dropdown?

maybe I should break this out as a separate issue

don't worry about it, i'm ok with including as part of the current WIP.

also curious if you tried "Age5p_Only_English_ACS_13_17"

i had it in but took it out as it seemed like a different meaning than the others:

is that not the case? if so then the API will have to be hit again to get those values and re-joined to the lookup table.

abettermap commented 3 years ago

Latest

also i pushed up some potential improvements to the color scheme classifications: https://deploy-preview-134--languagemapping.netlify.app/spatial . i'm definitely out of my element here, but the patterns look more evident i think.

oh the push may have also fixed the layer draw order issue where tracts were on top of points.

Census popups

i don't think we've answered the "popups for census?" question yet but locally i tried it in the same manner or very similar to how we do it for County/Neighb (via lookup table) and there's quite a lag. i realize popups aren't as useful for census, or at least the tract ID isn't, but it'd be nice to click and get the value at least, just not at the expense of waiting for it.

this would be another issue likely resolved by an uploaded non-Boundaries tracts layer as the values would be baked into each tract and the code would not have to potentially loop over 2000+ records to find the match- by comparison the Neighb lookup has a mere 272 records compared to tracts' 2k, so the lag is not a huge surprise.

it's also possible there's a lag on Neigh/County popups as well, but it's less evident since i don't show it until after the zoom/pan when the feature is clicked. we decided not to zoom to tracts though in order to save a considerable amount of data/space/perf for behavior that seems minimally useful and potentially annoying in the context of small features like tracts.

User luxuries

i know PUMA is not in the dropdowns but based on my latest push are we leaning towards answers on any of these yet for this dropdown?

should user even be allowed this luxury or will they trigger a "this is why we can't have nice things"?
if no luxury, is there a catch-all classification* of the three?
or is it super useful for some and not at all for others meaning we'd have to config it on a per-language basis?

IMO we should just pick one that's sufficient for all. they all show similar patterns, and config-less is always going to be simpler.

if we wanted to go super performant, i could have the natural breaks ready for each class before the layer ever sees the light of day rather than recalculating each time dropdown is changed. this would not be tied to the dataset of course, but if we're hurting for perf at some point, just throwing it out there.

for Item 2, it's not actually a classification- they're all using ckmeans which is evidently a step up from Natural Breaks (again out of my element and not even pretending to understand any of this, ha!). the dropdown actually refers to these in case that helps with clarification of the 1-3 above.

rperlin-ela commented 3 years ago

i don't think we've answered the "popups for census?" question yet but locally i tried it in the same manner or very similar to how we do it for County/Neighb (via lookup table) and there's quite a lag. i realize popups aren't as useful for census, or at least the tract ID isn't, but it'd be nice to click and get the value at least, just not at the expense of waiting for it.

In agreement, a simple popup could be nice for a value but only worth it if it's simple and painless

IMO we should just pick one that's sufficient for all. they all show similar patterns, and config-less is always going to be simpler.

I agree— I'm not a stats person, but I notice that Exponential here is giving us the most to look at. If we're just choosing one, do we even need a catch-all word and the whole "Rate of Change" dropdown and all that? If we need a disclaimer or a blurb somewhere that's fine, but I don't think people will expect these options. Btw, not share if we ever shared an old inspiration— this preusmably used PUMA data and does something a little different from what we're doing, but people really like it: https://www.jillhubley.com/project/nyclanguages/

typo perhaps? Countries i think?

Yes!

so a new badge/btn/link then, correct? that could work, even sans addl config, if we add a new column like "Census Language" or something. this would actually be far easier on my end (and probably yours) than the Explore-level since that needs more config.

Sounds great

you'd just need to make sure the value for those cells matches the column name in whatever tracts/PUMA layer/s we end up using. oh and actually if we can't get them into one layer, then we'd probably need another column like "Census source" or "Census layer" which would have values of:

puma

tract

nothing/empty

Ok, I'm ready to move on this whenever

i think i know what you're talking about but i can't replicate it at the moment. presumably has something to do with how i'm setting the symbology for the tracts. are you seeing it consistently and in all scenarios or just after playing with the Spatial Show by dropdown?

I think I'm seeing it pretty consistently in some form in all scenarios but it's a little hard to specify what zoom level it goes back out to when I start pinching— not always the same, maybe the thing it was just on before I started zooming in/out

also curious if you tried "Age5p_Only_English_ACS_13_17"

is that not the case? if so then the API will have to be hit again to get those values and re-joined to the lookup table.

I agree that it's a little different, but maybe we can nod to that by calling it "English (Only)". Maybe it'll feel wrong when we see it, but I do think it's likely to be important and interesting, and in any case these are not all strictly comparable... "French" is including Haitian Creole and "German" is including Yiddish and "Russian" is including "Polish" and "other Slavic" which is not at all intuitive for people and we're going to have account for — both wherever else in the UI and I think also in the dropdown list, where maybe it should be "French/Haitian Creole", "German/Yiddish", "Russian/Polish/Other Slavic".

abettermap commented 3 years ago

If we're just choosing one, do we even need a catch-all word and the whole "Rate of Change" dropdown and all that?

nope don't think so. the dropdown i guess was just 75% internal demo so the coloring type options could be played with, and 25% "maaaaybe user would want this?".

If we need a disclaimer or a blurb somewhere that's fine

won't need one if we remove the dropdown, but a blurb via an ℹ️ btn near the soon-to-be legend for green gradient would be a good spot to indicate the stats method, data sources, difference b/t puma and tracts, etc. this can definitely be hidden by default.

or better approach (still using "i" btn): in the About or Help guide, wherever makes sense, you should learn how to set id values on the blocks in WP. that way I can just link directly to that section (e.g. https://map.languagemapping.org/about#census-data) rather than creating and updating it in the code.

Maya and i set that up for some of the About links but i think some of it may have gotten jostled around in the shuffle as the links no longer work, e.g. https://map.languagemapping.org/about#legal does not take you to the Legal section.

creating those id's is definitely a candidate for the Big Manual, it's kind of a freebie for content mgmt as it keeps it current and requires minimal effort on my end. i don't have WP but i imagine stuff like that is in their docs, and Maya may recall how to do it as well.

Ok, I'm ready to move on this whenever

we to dial in the data stuff first. i'll keep that going in email, i just haven't replied yet.

there's nothing stopping you from adding two new cols in the interim tho:

Census Field
Census Layer, which will either be puma or tract. keep them lowercase for simplicity.

we could probably get clever with just the first column and some kind of prefix before the language name, but that sounds like a recipe for trouble.

if you start on that, please update Data Schema sheet as well.

I think I'm seeing it pretty consistently in some form in all scenarios but it's a little hard to specify what zoom level it goes back out to when I start pinching— not always the same, maybe the thing it was just on before I started zooming in/out

so this is just on mobile, or desk too?

I agree that it's a little different, but maybe we can nod to that by calling it "English (Only)". Maybe it'll feel wrong when we see it, but I do think it's likely to be important and interesting

ok. it's a lot of steps to make all this happen so let's use what we have until the data stuff settles in the email thread.

and in any case these are not all strictly comparable... "French" is including Haitian Creole and "German" is including Yiddish and "Russian" is including "Polish" and "other Slavic" which is not at all intuitive for people and we're going to have account for — both wherever else in the UI and I think also in the dropdown list, where maybe it should be "French/Haitian Creole", "German/Yiddish", "Russian/Polish/Other Slavic".

ha, yeah i figured it wouldn't be as easy as just the one-word name, i think that seals the fate of the two new data cols then!

rperlin-ela commented 3 years ago

won't need one if we remove the dropdown, but a blurb via an ℹ️ btn near the soon-to-be legend for green gradient would be a good spot to indicate the stats method, data sources, difference b/t puma and tracts, etc. this can definitely be hidden by default.

Yes! With this I would say let's be maximally simple otherwise. fwiw, I see the logic of a single dropdown with tracts and PUMAs as long as there's some little thing differentiating them (like an asterisk or divider with name within the dropdown)

or better approach (still using "i" btn): in the About or Help guide, wherever makes sense, you should learn how to set id values on the blocks in WP. that way I can just link directly to that section (e.g. https://map.languagemapping.org/about#census-data) rather than creating and updating it in the code.

Maya and i set that up for some of the About links but i think some of it may have gotten jostled around in the shuffle as the links no longer work, e.g. https://map.languagemapping.org/about#legal does not take you to the Legal section.

Ok, will try again at some point, may need help

there's nothing stopping you from adding two new cols in the interim tho:

Census Field

Census Layer, which will either be puma or tract. keep them lowercase for simplicity.

Ok, maybe I should wait, but just want to be sure I understand, I may not yet. For example, for Chinese (tract-level example), under "Census Field" would it be "Chinese" and under "Census Layer" it would be "Age5p_Chinese_ACS_13_17"? And for "Italian" (PUMA-level example) "Census Field" would be "Italian" and the "Census Layer" would be "F1000_Italian"? Likewise exactly the same for Sicilian if we want it linked to the Italian PUMA layer?

What about for a case like "German" (which should be "German/Yiddish" at the tract level, but just "German" at the PUMA)?

Also, this is probably worth quickly putting to the group, but what would be the best way for me to indicate to you the PUMA layers I don't think we need? My own instinct is not to show a few dozen of the ones that have such low numbers that I'm not sure they indicate anything (Pennsylvnia Dutch, Kiowa etc). I realize that might seem arbitrary, but I think a lot of those are actively misleading, and we're curating here anyway so probably should lay our cards on the table.

if you start on that, please update Data Schema sheet as well.

Started

I think I'm seeing it pretty consistently in some form in all scenarios but it's a little hard to specify what zoom level it goes back out to when I start pinching— not always the same, maybe the thing it was just on before I started zooming in/out

so this is just on mobile, or desk too?

I have the issue on desktop too, when I zoom in and then click once in preparation for dragging myself and looking around

abettermap commented 3 years ago

Yes! With this I would say let's be maximally simple otherwise. fwiw, I see the logic of a single dropdown with tracts and PUMAs as long as there's some little thing differentiating them (like an asterisk or divider with name within the dropdown)

so single dropdown would be your preference over two dropdowns then?

Ok, maybe I should wait

yeah let's hold off for a sec. this is either going one of two directions and i don't want to steer you in the wrong one.

For example, for Chinese (tract-level example), under "Census Field" would it be "Chinese" and under "Census Layer" it would be "Age5p_Chinese_ACS_13_17"?

no, sorry if poor explanation on my part:

Census Field: whatever the field name is for Chinese in the uploaded MB tracts tileset. (very TBD
at this point)

Census Layer: "tract" (without quotes)

And for "Italian" (PUMA-level example) "Census Field" would be "Italian" and the "Census Layer" would be "F1000_Italian"? Likewise exactly the same for Sicilian if we want it linked to the Italian PUMA layer?

Census Field: whatever the field name is for Italian in the uploaded MB tracts tileset. (very TBD
at this point)

Census Layer: "puma" (without quotes)

What about for a case like "German" (which should be "German/Yiddish" at the tract level, but just "German" at the PUMA)?

Uh oh! I thought you said we only use one or the other, not both? I was under the impression "tract-level if it's available, otherwise PUMA". is that not the case?

Also, this is probably worth quickly putting to the group, but what would be the best way for me to indicate to you the PUMA layers I don't think we need? My own instinct is not to show a few dozen of the ones that have such low numbers that I'm not sure they indicate anything (Pennsylvnia Dutch, Kiowa etc). I realize that might seem arbitrary, but I think a lot of those are actively misleading, and we're curating here anyway so probably should lay our cards on the table.

Totally. I think this will just be a matter of coordinating with Maya and deciding what goes into the uploaded tileset. The way i have it set now is simply grabbing all the fields from it and populating the dropdown with them, so it would just be a matter of excluding the unwanted fields from the upload. Some of them I may have to exclude manually but best-case scenario, I don't have any involvement/config and i just take all the fields entirety verbatim.

I have the issue on desktop too, when I zoom in and then click once in preparation for dragging myself and looking around

yeesh, ok. probably a simple fix but i'm going to hold off until the other stuff is in place so just sit tight until then.

rperlin-ela commented 3 years ago

Thanks, all very helpful. I can see it either way, but personally might lean towards a single dropdrown that has the following features:

-somehow differentiates tract ones from PUMA ones (divider in the dropdown? possibly asterisk?) -somehow but differently indicates problematic categories (possibly asterisk) -foregrounds reliable ones and backgrounds unreliables (pending team discussion)

As for our original idea, "tract-level if it's available, otherwise PUMA”, is that just up to me and Maya in terms of coordinating with what she uploads? I think it’s basically fine if we rename a few of the tract ones as I mentioned (e.g. German > German/Yiddish) and then still keep puma ones like “German” and “Yiddish”, because each is showing something different. I don’t see this eliminating any of the tract ones currently there.

On Nov 19, 2020, at 2:52 PM, Jason Lampel notifications@github.com wrote:

Yes! With this I would say let's be maximally simple otherwise. fwiw, I see the logic of a single dropdown with tracts and PUMAs as long as there's some little thing differentiating them (like an asterisk or divider with name within the dropdown)

so single dropdown would be your preference over two dropdowns then?

Ok, maybe I should wait

yeah let's hold off for a sec. this is either going one of two directions and i don't want to steer you in the wrong one.

For example, for Chinese (tract-level example), under "Census Field" would it be "Chinese" and under "Census Layer" it would be "Age5p_Chinese_ACS_13_17"?

no, sorry if poor explanation on my part:

Census Field: whatever the field name is for Chinese in the uploaded MB tracts tileset. (very TBD at this point)

Census Layer: "tract" (without quotes) And for "Italian" (PUMA-level example) "Census Field" would be "Italian" and the "Census Layer" would be "F1000_Italian"? Likewise exactly the same for Sicilian if we want it linked to the Italian PUMA layer?

Census Field: whatever the field name is for Italian in the uploaded MB tracts tileset. (very TBD at this point)

Census Layer: "puma" (without quotes) What about for a case like "German" (which should be "German/Yiddish" at the tract level, but just "German" at the PUMA)?

Uh oh! I thought you said we only use one or the other, not both? I was under the impression "tract-level if it's available, otherwise PUMA". is that not the case?

Also, this is probably worth quickly putting to the group, but what would be the best way for me to indicate to you the PUMA layers I don't think we need? My own instinct is not to show a few dozen of the ones that have such low numbers that I'm not sure they indicate anything (Pennsylvnia Dutch, Kiowa etc). I realize that might seem arbitrary, but I think a lot of those are actively misleading, and we're curating here anyway so probably should lay our cards on the table.

Totally. I think this will just be a matter of coordinating with Maya and deciding what goes into the uploaded tileset. The way i have it set now is simply grabbing all the fields from it and populating the dropdown with them, so it would just be a matter of excluding the unwanted fields from the upload. Some of them I may have to exclude manually but best-case scenario, I don't have any involvement/config and i just take all the fields entirety verbatim.

I have the issue on desktop too, when I zoom in and then click once in preparation for dragging myself and looking around

yeesh, ok. probably a simple fix but i'm going to hold off until the other stuff is in place so just sit tight until then.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/115#issuecomment-730599912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5GYN54AHN2SHZWP4GTSQVZPHANCNFSM4SQ7JRDQ.

abettermap commented 3 years ago

something like this?

assuming column order is retained, and i don't see why it wouldn't be, that seems doable.

As for our original idea, "tract-level if it's available, otherwise PUMA”, is that just up to me and Maya in terms of coordinating with what she uploads?

yep. that redundancy is fine across tract/puma, even if the names are the same in each tileset. accidental bonus of using two tilesets. 💫 the only thing you can't do in this setup is have both puma and tract in the language config sheet for a given language. at least for now you'll have to stick with one or the other, and continue with our "tract if available, otherwise puma" approach.

rperlin-ela commented 3 years ago

Exactly, and I take it that Maya and I can somehow determine the ordering within “by tract” and “by puma"? I’m thinking just to do it by total NYC speaker population, e.g. English, Spanish, Chinese etc. so whatever’s easiest in terms of doing it in Maya’s layer or on your end. The ones that are clear and by tract (e.g. Vietnamese) won’t need the PUMA layer.

On Nov 20, 2020, at 12:51 PM, Jason Lampel notifications@github.com wrote:

something like this?

https://user-images.githubusercontent.com/4974087/99832293-8d52ec80-2b1d-11eb-937a-953b4534d109.png assuming column order is retained, and i don't see why it wouldn't be, that seems doable.

As for our original idea, "tract-level if it's available, otherwise PUMA”, is that just up to me and Maya in terms of coordinating with what she uploads?

yep. that redundancy is fine across tract/puma, even if the names are the same in each tileset. accidental bonus of using two tilesets. 💫 the only thing you can't do in this setup is have both puma and tract in the language config sheet for a given language. at least for now you'll have to stick with one or the other, and continue with our "tract if available, otherwise puma" approach.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/115#issuecomment-731314452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5DOEHNZGWHTO7LAV7DSQ2UALANCNFSM4SQ7JRDQ.

abettermap commented 3 years ago

Exactly, and I take it that Maya and I can somehow determine the ordering within “by tract” and “by puma"?

yeah just change the column order in the sheets she gave us.

I’m thinking just to do it by total NYC speaker population, e.g. English, Spanish, Chinese etc.

you're breaking away from your alphabetized pattern? :) i'd prefer alphabetical as a user, but that's me.

so whatever’s easiest in terms of doing it in Maya’s layer or on your end.

...or your end. much much easier to just change the column order, so go for it.