Language-Mapping / language-map

Front-end codebase for Language Mapping web map
https://languagemap.nyc
MIT License
6 stars 4 forks source link

Correctly show endonyms on all platforms #18

Closed abettermap closed 3 years ago

abettermap commented 4 years ago

Summary

Some endonyms are not rendering in certain environments. Boxes bad, characters good:

image

Tracking

Spreadsheet

Started a rough spreadsheet to keep track of which platforms struggled with which fonts. Note that rows with empty Endonyms are hidden, and the dataset is filtered down to unique Lang/Endo combos.

Results so far

Jason
Other team members

Please document as you find problems, either here or in the spreadsheet.

Long-term approach

Say we get 100% of the fonts to render on all browsers and all OS's for the 4 team members. Great start, but what about actual users? How do we make sure people see what we see? Reaching out to some beta/test users is a good start, and a long-term fallback could be to include a feedback form for reporting problems (for fonts or otherwise).

Resources

https://docs.mapbox.com/help/troubleshooting/manage-fontstacks/ https://en.m.wikipedia.org/wiki/Help:Multilingual_support

rperlin-ela commented 4 years ago

I sent an inital email on this whole topic to one contact who has worked with the development of Unicode, and this was her response: "For fonts, I would recommend Noto family, which includes most scripts in Unicode, and is cross-platform. (Github repository: https://github.com/googlefonts/noto-fonts, older website: https://www.google.com/get/noto/) I'm not a font expert, but another option is to use webfonts, in which the webpage will serve the font (and doesn't rely having the font on one's machine to display)." She also gave me the emails of one or two other people who I can reach out to.

abettermap commented 4 years ago

Good info. I wonder if the Noto Sans Google Font works too?

abettermap commented 4 years ago

I looked into this a bit and Noto isn’t a single font but rather a large family with hundreds of fonts. Obviously we only need a subset of that, and even then only when a particular endonym is show, so my initial thought is to load the extra ones dynamically. I didn’t know this was possible but the browser support is actually great.

Code aside, we still have some steps to complete first:

  1. Isolate the problem children.
  2. Determine which font (Noto variation, other font, or none available) should be used to display each missing endonym.
  3. Either indicate this in the data (e.g. a simple Endonym Font column), OR
  4. ...Maintain the info elsewhere. Believe it or not we could actually use WP for this if we get that CPT plugin up and running.

4 has a better smell than 3 to me since it would not require Maya to update the GIS. One drawback might be a greater chance of error that is inherent in maintaining two separate datasets (more chance for typos). If it’s only a couple dozen records, however, I think you could keep it tight, especially if you copy and paste values from the sheet to WP rather than typing them out.

Either way I think it’s better to store this info externally to the code for the same reasons we are keeping content separate from code:

Whatever the approach, I think ultimately I would just need to know these things:

  1. Language name in English
  2. Name of font to use
  3. Hopefully a URL of the font to load. I think these are typically available but if not then we could store the fonts somewhere

What’s crazy is that we need these fonts for ONE WORD. If there was a way to get only the characters needed, it would reduce the amount of loading big time. I don’t know a ton about this but might be worth looking into.

If none of this pans out for whatever reason or it’s deemed too much work in this timeframe, I think the easiest fallback is an image of the endonym (preferably SVG). This might get weird in small sizes like a results list/table, and would definitely not be accessible, searchable, or anything else text-related, but if it’s purely for display purposes in the popup/info view then it seems like the simplest approach.

That’s a lot and I’m not sure I explained clearly so holler if you need clarification, otherwise let me know what your thoughts are thus far.

rperlin-ela commented 4 years ago

This sounds like a good plan to me, with SVG's as plan B. I don't think I totally grasp where the info would live yet, or only vaguely in theory, but I can start gathering a spreadsheet or some such as follows, if I'm understanding right?

  1. Sylheti
  2. Noto Sans Syloti Nagri
  3. https://www.google.com/get/noto/#sans-sylo
abettermap commented 4 years ago

Believe it or not I was able to load some of the problematic fonts onto Dropbox and confirm that they do indeed render:

image

but I can start gathering a spreadsheet

Any objection to using the one I created? It was intended for tracking the problematic fonts but I'm fine putting the URLs in there as well:

image

Long-term solutions

Storing fonts

Dropbox was ridiculously straightforward for storing fonts, so let me know what you think about that approach.

Tracking the problematic fonts

I'm thinking WP might be super overkill for this, and storing this info in the full dataset would be overkill in a different way: 1000+ records and only a handful with font issues would be hard to justify adding a new column since that column is returned in the dataset regardless of whether it is populated. Sorry, bad explanation, but not super important.

However, I'll still need a way to know which fonts need to be loaded behind the scenes and we still want that info to be in your hands rather than in the code (I mean you're welcome to start making Git commits, but I don't think that's the right approach here 😃 ).

So, if not in WP and not in the main codebase, maybe we could get it into a GitHub Gist. This service is free and although it was designed for showing little snippets and tutorials, there's no reason it can't be used as free file storage (specifically a .json file, such as this realistic example I created).

If you are open to learn more about this I would be happy to create a short doc on the process. It should just be a few short steps plus a couple in Dropbox, and overall comparable or less than the effort the WP plugin would've required.

Thoughts?

abettermap commented 4 years ago

Fun fact: those non-rendering "rectangles" have a name:

image

rperlin-ela commented 4 years ago

Tofu! That's useful.

Spreadsheet looks great. Do you want us to fill in on this end, or are you already on it?

In terms of storing fonts on Dropbox and tracking the problematic fonts with GitHub Gist (based on the spreadsheet?), I'm agnostic and happy to follow your lead. Only thing I'd advocate, here as elsewhere, is to keep things as simple and self-contained as possible to avoid things breaking down the line. (We'd still be using Wordpress for the About Us page, right?)

abettermap commented 4 years ago

Replies

Do you want us to fill in on this end, or are you already on it?

Please do. I could see this task being either party's responsibility so let me know if you feel differently, but so far I've done a good bit of research, set up the sheet, and created a tiny working demo/test, so if you don't mind then yes, please take the lead on it. It's also good for you to do it since it's good practice for Post-August Ross.

Only thing I'd advocate, here as elsewhere, is to keep things as simple and self-contained as possible to avoid things breaking down the line.

Totally agree. Fewer moving parts = better. Gist was just one option but if we get the plugin set up in #11 and it does what we want, then that would be my vote. I could also see it being used for additional non-About, non-endonym concepts like the language points labeling potential I mentioned in #38.

We'd still be using Wordpress for the About Us page, right?

Yep. If it's one big page then it can be a WP Post or Page, but if it's multiple smaller chunks then the CPT plugin might make more sense.

WP API + CPT + MB Style thought process

Not to get overly technical and ahead of myself here, but this workflow logic is fresh in my brain and will be useful if we implement this later:

  1. Create several text-only Styles set up in Mapbox Styles (one for each field we want to allow the user to choose from in a dropdown).
  2. In a Custom Post Type set up in WP (via that plugin), store the following info for each label Style:
    1. url or whatever provides a unique reference to that Style (e.g. mine is mapbox://styles/abettermap/ckc3p2nzk06io1inut7766xsa)
    2. friendly_name or pretty_name or something to denote something the user will see in the "Label by:" dropdown (e.g. "Neighborhoods").
  3. In my code, I would:
    1. Query your WP API for that particular Post Type
    2. Sort it alphabetically by pretty_name
    3. Populate the "Label by:" dropdown with that
    4. When user changes dropdown, set the labels to point to the corresponding url for that Style

I could see this one needing clarification and I think much of it would be up to @fiddleHeads once I get it set up, but curious to hear everyone's opinions.

fiddleHeads commented 4 years ago

This sounds reasonable. But I am not very excited about the idea of having to update a subset of data on WP in addition to updating the vector tile after running the language data through my whole GIS process when there are updates to the language data. Or is that not what is being proposed here? Thanks.

abettermap commented 4 years ago

WP

Yeah your workflow already has a lot of steps, definitely don't want to add more! If I understand your question correctly then no, there is no subset. WP would just be for maintaining a simple list of url:pretty_name pairs:

url pretty_name
mapbox://styles/abettermap/ckc3p2nzk06io1inut7766xsa Endonym
mapbox://styles/abettermap/42basdnzk06io1inut7766xsa Glottocode
mapbox://styles/abettermap/42basaddj483lj1inut7766xsa Language
mapbox://styles/abettermap/24355dfdj483lj1inut7766xsa Primary Country
mapbox://styles/abettermap/42basddaj483lj1inut7766xsa Top-Level Family

Other options

JSON, stored elsewhere

...e.g. as a GitHub gist file. Discussed this w/Ross above in regards to endonym fonts, but it could be done for the MB Styles as well. This short little list is simple enough to do as JSON, just thought WP might be easier so you wouldn't need another platform.

JSON, stored in code

After reviewing the schema, however, it looks like there's only a handful of fields that will be used as labels. If that's the case and you just want me to hard-code the URLs and names in the code, that's fine too, you just won't have the simpler workflow that you would outside the code. Git + GitHub are more complex, and I'd prefer to keep the settings stuff as it relates to your end of the workflow stored outside the code.

Spreadsheet

Most convenient for you and Ross, but it would require some overkill overhead on the code side to parse that. Definitely simplest from code perspective to use JSON as that's the standard format on the web these days (RIP XML!).

~Uh oh,~ endonyms as labels?

Not sure if this got lost or just overlooked by me, but I see that the intent is to use endonyms as map labels? ~I'm not sure this is possible with MB since they only allow certain fonts~ not true, they allow uploads too:

image

I think this will take some work regardless of who is doing it, but good to know it's doable-ish. I was under the impression that the endonym would only be shown in the popups, not the map, so my mistake there. I guess in that case we would need an MB Style for each label field that the user can choose, but for endonyms there would be either:

  1. one style per "special" font, filtered down to just the point/s that need it
  2. one style total for Endonym, with a filter for each condition that needs a special font
  3. some kind of grouping, not sure if that's how it works though:

image

Just brainstorming again but it seems like it will come up later so thought I'd get it out there.

fiddleHeads commented 4 years ago

Hi Jason. Thanks. Regarding maintaining a list of "URL pretty pairs" on WP, that sounds totally fine. I think I misunderstood how we'd be using WP.

As for the endonyms as map labels, I think the idea is to emulate how the static print map displayed languages, so that the point symbology is actually replaced by the endonym itself. Here's a screenshot of how I was playing around with that possibility in Esri's WebApp Builder, although it looks like I actually left the points in, too, in this case. endonyms

It also gets at the possibility of being able to display the language data in different ways, perhaps displayed by its endonym and then displayed as a separate layer as points. This is what I was imagining in WebApp Builder, anyway, but perhaps displaying points as endonyms is sufficient for your purposes, @rperlin-ela? I seem to recall this convo elsewhere, perhaps in the data schema, but it's worth clarifying here, I guess.

abettermap commented 4 years ago

Points + text seems best to me. I was able to get the endonyms to show pretty easily in the MB GUI:

image

The more I play around with the MB styles, the more powerful it seems (tbf I'm used to old-school OpenLayers and Leaflet so MB is pretty next-level 🚀 ). We can go down that rabbit hole in a separate thread once I learn a bit more, but based on what I've seen so far I'm starting to think that all of the styles can be done in MB. The dynamic filtering is where it will get code-heavy, but getting the styles into long-term maintainable platform rather than code is handy on so many levels!

I think that you'll find that there will always be room for improvement on labels, especially long and complex ones, so putting those options in your hands makes a lot of sense for the long haul.

fiddleHeads commented 4 years ago

Sounds great regarding the potential of MB styles...

rperlin-ela commented 4 years ago

Update on the endonym front... we looked into it and apparently found no tofu on Chrome, Firefox, even Internet Explorer on Mac or PC besides those already noted in the spreadsheet.

abettermap commented 4 years ago

Ok that’s not too bad then.

So can we consider that sheet to be the complete source of truth for problematic fonts then? Obviously can add more later since lots of devices and platforms out there, but let me know if the sheet as it stands provides a sufficient baseline of what we’re up against.

abettermap commented 4 years ago

Also I see some activity in the fonts sheet including font URLs. Those are pointing to the google source but it still needs to be downloaded, unzipped, and Dropboxed (or wherever it’s being stored for the long run). You guys are handling that process, correct? I can document the Dropbox steps if needed but it’s basically just the usual upload, make sure it’s set to public, then grab the share URL, then tweak the URL to match the format I have in my two sheet entries (this makes it downloadable).

I don’t think Google Drive has a “raw” equivalent so that’s why I suggested Dropbox, but any cloud storage platform that serves raw files should work.

rperlin-ela commented 4 years ago

Got it — they’re in zip files now, but should it actually just point to the .ttf? Otherwise looks good?

On Jul 6, 2020, at 12:45 PM, Jason Lampel notifications@github.com wrote:

Also I see some activity in the fonts sheet including font URLs. Those are pointing to the google source but it still needs to be downloaded, unzipped, and Dropboxed (or wherever it’s being stored for the long run). You guys are handling that process, correct? I can document the Dropbox steps if needed but it’s basically just the usual upload, make sure it’s set to public, then grab the share URL, then tweak the URL to match the format I have in my two sheet entries (this makes it downloadable).

I don’t think Google Drive has a “raw” equivalent so that’s why I suggested Dropbox, but any cloud storage platform that serves raw files should work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#issuecomment-654348043, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5GF32N2CG5ZQ7GH52DR2H5THANCNFSM4OFFMVMQ.

abettermap commented 4 years ago

but should it actually just point to the .ttf

Yep! And specifically just the "Regular" version of it (aka not Bold, Italic, etc.).

Otherwise looks good?

Close, just a few things:

  1. Make sure your URLs are in this format: https://dl.dropboxusercontent.com/s/kabn5a3iaoupcgb/NotoSansSylotiNagri-Regular.ttf?dl=0, specifically the https://dl.dropboxusercontent.com/s part. The one you get when you copy from their "Share" thing points to the same file, but I think it opens in their viewer or something.
  2. Make sure the files are set to public.
  3. Test them in an Incognito Window ("Private browsing tab" on Firefox) so that you can confirm that it's not relying on your login. You should be prompted to download the .ttf file.
rperlin-ela commented 4 years ago

Thanks for this. How’s it looking now?

On Jul 6, 2020, at 11:16 PM, Jason Lampel notifications@github.com wrote:

but should it actually just point to the .ttf

Yep! And specifically just the "Regular" version of it (aka not Bold, Italic, etc.).

Otherwise looks good?

Close, just a few things:

Make sure your URLs are in this format: https://dl.dropboxusercontent.com/s/kabn5a3iaoupcgb/NotoSansSylotiNagri-Regular.ttf?dl=0, specifically the https://dl.dropboxusercontent.com/s part. The one you get when you copy from their "Share" thing points to the same file, but I think it opens in their viewer or something. Make sure the files are set to public. Test them in an Incognito Window ("Private browsing tab" on Firefox) so that you can confirm that it's not relying on your login. You should be prompted to download the .ttf file. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#issuecomment-654573956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5EPMS6C2CTPE4EM3HTR2KHRFANCNFSM4OFFMVMQ.

abettermap commented 4 years ago

The URLs still had www so I did a Find & Replace. No biggie, only took a minute, and I verified the first one is working:

https://www.dropbox.com/s/d1yh2f8seympe6e/NotoSansVai-Regular.ttf?dl=0

abettermap commented 4 years ago

Next step

...is to get them into JSON. Now that we know there are only 16ish, I really think my WP idea is overkill! JSON is kinda the lingua franca (ha!) of data formats so if we can get that somewhere then we might be able to skip the WP plugin altogether ("About" page content remains as-is as a page though).

Mind giving the GH Gist approach a try? If it seems too cumbersome we can try something else:

GH Gist steps

One-time step: start with Jason's JSON

...just to get you started. Going forward, you'd use your own.

  1. Go to my gist
  2. Edit > Select All
  3. Copy

One-time step: create your own gist

  1. Go to https://gist.github.com/ (log into GH if needed)
  2. (optional) enter a Gist description... at the top
  3. For Filename including extension..., enter bad-lang-fonts.json or whatever you want to call it
  4. Click in the text area below it (the big block of emptiness)
  5. Edit > Paste the text you copied from my gist
  6. Click Create public gist
  7. Give me the URL of the resulting page

Updating it

  1. Visit the URL from Step 7 above
  2. Click the Edit button with the pencil
  3. Make your edit/s. I suggest using a text editor like VS Code for this, but if you're careful and/or follow the next step, should be fine.
  4. Highly recommend validating it before next step. Select All the JSON, Copy, then Paste into https://jsonlint.com/
  5. Click Update public gist
abettermap commented 4 years ago

Alternatively to editing JSON (been there, done that, not fun!), you could:

  1. maintain a CSV file, making updates as needed
  2. Select All, Copy
  3. convert to JSON: https://csvjson.com/csv2json
  4. update Gist with the result

Doesn't matter to me how you do it, I'd just like JSON as the end result. CSV-then-JSON kinda seems like the most maintainable though. It would let you sort and auto-complete a lot more easily, and it's a familiar spreadsheet format.

rperlin-ela commented 4 years ago

This should be the URL: https://gist.github.com/rperlin-ela/b88405b8bbef3e653b2023f26c726639.

Let me know if not. Thanks for making this dead easy.

On Jul 7, 2020, at 7:11 PM, Jason Lampel notifications@github.com wrote:



Alternatively to editing JSON (been there, done that, not fun!), you could:

  1. maintain a CSV file, making updates as needed
  2. Select All, Copy
  3. convert to JSON: https://csvjson.com/csv2json
  4. update Gist with the result

Doesn't matter to me how you do it, I'd just like JSON as the end result. CSV-then-JSON kinda seems like the most maintainable though. It would let you sort and auto-complete a lot more easily, and it's a familiar spreadsheet format.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#issuecomment-655185448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5GGWQRUNYVPNC4UE6TR2OTSFANCNFSM4OFFMVMQ .

abettermap commented 4 years ago

Perfect, glad it was straightforward. I probably won't document the steps in another doc elsewhere, I think this issue thread has enough detail to serve that purpose, so please bookmark or chuck it in your own notes so you'll have a reference to the steps in case you need them.

I think we're good on your end but let me know if you have questions. Will keep this issue open though as I'll have plenty of steps left to complete it.

abettermap commented 4 years ago

@rperlin-ela I have a slightly different approach to this that I'd like to propose: store the JSON in GH repo instead of Gist. I think it's the same number of steps for you, but there are significant benefits:

Steps to edit bad-lang-fonts.json

If you're still with me and want to give it a try:

  1. Go here: https://github.com/Language-Mapping/config/edit/master/bad-lang-fonts.json
  2. Make some edits. There's nothing "real" to do this time but if you want to add some dummy lines go ahead and I'll revert it just to demonstrate how undoable Git can be.
  3. Scroll down to Commit changes
  4. Replace "Update bad-lang-fonts.json" placeholder with a super-concise message (less than 80 chars) about what you did, with an optional extended description below. The two fields compose your commit message which will show up in the Git log. Not likely you'll ever need to visit that, but write a good message that Future Ross will understand and appreciate.
  5. Leave the other defaults as-is.
  6. Click the Commit changes button.

That's pretty painless, yeah? Similar enough process to the Gist approach?

Let me know what you think.

rperlin-ela commented 4 years ago

Super easy — just did a Commit Changes to try it, but by all means revert — all good!

On Jul 13, 2020, at 2:13 PM, Jason Lampel notifications@github.com wrote:



@rperlin-ela https://github.com/rperlin-ela I have a slightly different approach to this that I'd like to propose: store the JSON in GH repo instead of Gist. I think it's the same number of steps for you, but there are significant benefits:

We can both edit it (as can others on team if I add them) The revisions (commits) are easy to see in case something breaks and need to go back and undo or troubleshoot I can think of several other "config" things that would make for nice JSON, keep things decoupled and in post-Jason team hands without getting tangled in the main repo. I can clone it and edit in a text editor on my computer (Gists are hard to do that with) Easier collaboration on specific lines of code. Steps to edit bad-lang-fonts.json

If you're still with me and want to give it a try:

Go here: https://github.com/Language-Mapping/config/edit/master/bad-lang-fonts.json https://github.com/Language-Mapping/config/edit/master/bad-lang-fonts.json Make some edits. There's nothing "real" to do this time but if you want to add some dummy lines go ahead and I'll revert it just to demonstrate how undoable Git can be. Scroll down to Commit changes Replace "Update bad-lang-fonts.json" placeholder with a super-concise message (less than 80 chars) about what you did, with an optional extended description below. The two fields compose your commit message which will show up in the Git log. Not likely you'll ever need to visit that, but write a good message that Future Ross will understand and appreciate. Leave the other defaults as-is. Click the Commit changes button. That's pretty painless, yeah? Similar enough process to the Gist approach?

Let me know what you think.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#issuecomment-657712230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5FBMCLDXGDTT6PBI2DR3NFD3ANCNFSM4OFFMVMQ.

abettermap commented 4 years ago

Ha, Lampelish! A rare northern Michigan dialect only spoken after 3-6 beers...

I removed it, thanks for testing it out. I'll plan on using it as the source of truth once I get to the font-loading stage.

rperlin-ela commented 4 years ago

Hope you polish your Lampelish w/the fam!

Just wanted to note here that there are a few tricky Endonyms for which there is no Unicode font yet, as far as I know. Both Avestan and ASL call for IMAGE in the spreadsheet, and I'm attaching the images we can hopefully use here. Note also that "Khalkha Mongolian" and "Southern Mongolian" use a vertical font/script. In the spreadsheet I've now made sure the endonym for those two is correctly oriented with "text rotation", but I have no idea how this will hold up by the time it reaches you. I guess let's just do our best...

ASL Avestan

rperlin-ela commented 4 years ago

One more related issue: right-to-left scripts — these are coming through correctly in the Details panel but not in the on-map labels. Something we can fix on this end? The spreadsheet renders everything correctly, but perhaps we should be using "Horizontal Align (Right)" on the cell, or does that not make a difference?

There are a few other discrepancies where the panel is doing things right and the map label not so much, e.g. Loke.

abettermap commented 4 years ago

Images

For those, let's just put the URL directly in there instead of IMAGE. Store them on Dropbox, make them public, and use the URL format we are using for fonts. Then I will include a check in the code to look for anything beginning with http or https, and use the image instead of the text.

Semi-fragile but I think it's doable. Sound good?

Vertical

(fun fact: Mongolian is one of my fave written scripts. Beautiful!)

Formatting of any kind in a spreadsheet will not survive the data gauntlet it must run through from Sheets to CSV to JSON, so your text orientation and alignment will never see the light of production.

To avoid another set of config, how do you feel about using images for these as well? If it's only a handful then it should be minimal work on your end, and overall less work for both of us since it will come directly from the data rather than an external config. Can you score a nice PNG of Mongolian? Or better yet, make an SVG? Could try this tool: https://convertio.co/txt-svg/

If possible a better Avestan would be nice as well (SVG always preferable, otherwise PNG), and if you can find an ASL SVG, additional bonus points.

RTL

I found this plugin example but I'm not sure it applies to our situation: https://docs.mapbox.com/mapbox-gl-js/example/mapbox-gl-rtl-text/

Questions

  1. Is RTL simply a matter of right-alignment, or is it by word (as in first word of line is all the way right), or ALL the characters (as in last character is first)?
  2. How many are there?

Asking about Item 2 because I think this may need to be done manually in MB Studio (same with the custom fonts), just wondering what we're up against. Like would the logic "Language name in English contains 'Hebrew' or 'Arabic'" cover it or are there a lot more?

abettermap commented 4 years ago

Ah that's right, you said Loke and others, so are those having RTL problems or something else?

Let me know about that and the other things in my prev comment.

rperlin-ela commented 4 years ago

Got it in terms of the images, will see what I can do!

For RTL, there should be a good number of these in total (dozens), but they might all be variants of Hebrew, Arabic, Yiddish, and a few others, I can check. It's not right alignment in the sense I understand alignment (Left, Center, or Right), which I don't think would be evident in the labels anyway. It's the actual order of the characters, last character first, for example: instead of the correct ליטװיש יידיש, we're getting שידיי שיװטיל.

With Loke and some others (which are LTR), it's actually not a RTL-type issue but a compositional issue that applies to a range of (I believe mostly) complex LTR Indic scripts where characters are combined to produce new complex characters as detailed to some extent here: https://en.wikipedia.org/wiki/Help:Multilingual_support_(Indic).

abettermap commented 4 years ago

Got it in terms of the images, will see what I can do!

Sounds good. I don't know how we'll handle those ones in MB though. I think it might have to be a conditional thing similar to what I'll use in the code, where it checks for the http prefix. But obviously it can't use an image as a label, so it will have to default to English name.

For vertical, if it's only Mongolian variants then I can probably hardcode a conditional check to look for that prior to the http check. That way we can ~still use an image in the UI~ scratch that. If it's only a handful of instances, it's probably ok if I hardcode it. CSS can handle that just fine:

image

...so if that's the case then don't worry about finding images for vertical scripts. Just let me know the English names of the records with vertical scripts and I'll deal with it on my end.

It's the actual order of the characters, last character first, for example: instead of the correct ליטװיש יידיש, we're getting שידיי שיװטיל.

I think I fixed it by simply uploading Noto Sans Regular (the font we're using in the app) to Mapbox and using that for the endonyms labels!

image

Look correct to you?

complex LTR Indic scripts where characters are combined to produce new complex characters

So cool! No idea how to handle this though. Questions:

  1. When you say "and some others", how many are we talking?
  2. Have you seen this happen in other non-MB scenarios?
  3. ...and, if so, are you aware of any fonts (Noto or otherwise) that would render it properly? Noto Sans Regular does not seem to fix it in MB:

image

Question on same-as-English endo's

Would it mess up your data/workflow if you populated the Endonym? I realize that's a bit redundant so no worries if you don't want to, but technically it does still seem like an endonym and populating that column would promote filter consistency and also ensure that all points are automatically labeled in MB when Endonym is the same as Language. The check is trivial in JS (to show it in the UI), but in MB it would require more logic to say "if Endonym blank, use Language instead", so let me know what you think. I would need to change my logic to populate the Details (would use "if Language equals Endonym, hide endonym" rather than "if endonym is not populated, hide it").

I am assuming here though that all records have an endonym. If that is incorrect then please let me know.

What about http in Results panel?

I guess no one would be searching for an ASL endonym so that's fine, but is it safe to assume no one will be searching by Endonym for the other image-only endo's like Avestan? They would still find it via Language of course, but just wondering how our image-only approach will impact filtering and results.

abettermap commented 4 years ago

Also found this super-related-but-unresolved MB issue. The fact that it's unresolved makes me wonder if we'll need to resort to English for some of those. 😞

We've done great so far finding solutions to MB font issues, but if we exhaust our resources would it be acceptable to render those remaining tricky ones in English? This will get hairy with config because MB will need to know about it while the UI is working fine, but it's better than showing an incorrectly rendered endonym.

rperlin-ela commented 4 years ago

Sounds good. I don't know how we'll handle those ones in MB though. I think it might have to be a conditional thing similar to what I'll use in the code, where it checks for the http prefix. But obviously it can't use an image as a label, so it will have to default to English name.

The links to the images are already in the spreadsheet as of yesterday, including the two vertical Mongolian ones, let me know if you can't hit them or you need anything else. If those four (Avestan, ASL, the two Mongolian) can't appear as labels on the map and it has to default to English names, it's not the end of the world.

I think I fixed it by simply uploading Noto Sans Regular (the font we're using in the app) to Mapbox and using that for the endonyms labels!

Excellent, that did the trick.

So cool! No idea how to handle this though. Questions:

  1. When you say "and some others", how many are we talking?
  2. Have you seen this happen in other non-MB scenarios?
  3. ...and, if so, are you aware of any fonts (Noto or otherwise) that would render it properly? Noto Sans Regular does not seem to fix it in MB:

I can do a survey and let you know. At the moment not seeing the endonym labels and I wonder if any kind of even super-beta filtering function will be ready to push soon, so that I can easily check all the Tibetan ones, for instance.

Would it mess up your data/workflow if you populated the Endonym?

No problem at all to fill in the blanks in the Endonym column for cases where it's the same as the English name. Assuming I understand this correctly.

What about http in Results panel?

Totally safe to say that no one or just about no one will be searching for the image-only ones. Good to flag, but not something to worry about

fiddleHeads commented 4 years ago

Just to chime in here to say that I've been populating blank/NULL values in the Endonym column with values from the Language column in GIS as part of the process. Let me know if this will be done in the spreadsheet beforehand so I don't build it into my model/script, or I'm happy to do that in the GIS.

I downloaded the spreadsheet yesterday to start processing it.

Thanks,

Maya


From: rperlin-ela notifications@github.com Sent: July 27, 2020 7:03 AM To: Language-Mapping/language-map Cc: Daurio, Corrie Maya; Mention Subject: Re: [Language-Mapping/language-map] Correctly show endonyms on all platforms (#18)

Sounds good. I don't know how we'll handle those ones in MB though. I think it might have to be a conditional thing similar to what I'll use in the code, where it checks for the http prefix. But obviously it can't use an image as a label, so it will have to default to English name.

The links to the images are already in the spreadsheet as of yesterday, including the two vertical Mongolian ones, let me know if you can't hit them or you need anything else. If those four (Avestan, ASL, the two Mongolian) can't appear as labels on the map and it has to default to English names, it's not the end of the world.

I think I fixed it by simply uploading Noto Sans Regular (the font we're using in the app) to Mapbox and using that for the endonyms labels!

Excellent, that did the trick.

So cool! No idea how to handle this though. Questions:

  1. When you say "and some others", how many are we talking?
  2. Have you seen this happen in other non-MB scenarios?
  3. ...and, if so, are you aware of any fonts (Noto or otherwise) that would render it properly? Noto Sans Regular does not seem to fix it in MB:

I can do a survey and let you know. At the moment not seeing the endonym labels and I wonder if any kind of even super-beta filtering function will be ready to push soon, so that I can easily check all the Tibetan ones, for instance.

Would it mess up your data/workflow if you populated the Endonym?

No problem at all to fill in the blanks in the Endonym column for cases where it's the same as the English name. Assuming I understand this correctly.

What about http in Results panel?

Totally safe to say that no one or just about no one will be searching for the image-only ones. Good to flag, but not something to worry about

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Language-Mapping/language-map/issues/18#issuecomment-664415623, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APWUF4AJLX4LAXGYYFJ2PIDR5WCKHANCNFSM4OFFMVMQ.

rperlin-ela commented 4 years ago

Ok, we'll do this in the spreadsheet if that will simplify things!

On Jul 27, 2020, at 10:18 AM, fiddleHeads notifications@github.com wrote:

Just to chime in here to say that I've been populating blank/NULL values in the Endonym column with values from the Language column in GIS as part of the process. Let me know if this will be done in the spreadsheet beforehand so I don't build it into my model/script, or I'm happy to do that in the GIS.

I downloaded the spreadsheet yesterday to start processing it.

Thanks,

Maya


From: rperlin-ela notifications@github.com Sent: July 27, 2020 7:03 AM To: Language-Mapping/language-map Cc: Daurio, Corrie Maya; Mention Subject: Re: [Language-Mapping/language-map] Correctly show endonyms on all platforms (#18)

Sounds good. I don't know how we'll handle those ones in MB though. I think it might have to be a conditional thing similar to what I'll use in the code, where it checks for the http prefix. But obviously it can't use an image as a label, so it will have to default to English name.

The links to the images are already in the spreadsheet as of yesterday, including the two vertical Mongolian ones, let me know if you can't hit them or you need anything else. If those four (Avestan, ASL, the two Mongolian) can't appear as labels on the map and it has to default to English names, it's not the end of the world.

I think I fixed it by simply uploading Noto Sans Regular (the font we're using in the app) to Mapbox and using that for the endonyms labels!

Excellent, that did the trick.

So cool! No idea how to handle this though. Questions:

  1. When you say "and some others", how many are we talking?
  2. Have you seen this happen in other non-MB scenarios?
  3. ...and, if so, are you aware of any fonts (Noto or otherwise) that would render it properly? Noto Sans Regular does not seem to fix it in MB:

I can do a survey and let you know. At the moment not seeing the endonym labels and I wonder if any kind of even super-beta filtering function will be ready to push soon, so that I can easily check all the Tibetan ones, for instance.

Would it mess up your data/workflow if you populated the Endonym?

No problem at all to fill in the blanks in the Endonym column for cases where it's the same as the English name. Assuming I understand this correctly.

What about http in Results panel?

Totally safe to say that no one or just about no one will be searching for the image-only ones. Good to flag, but not something to worry about

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Language-Mapping/language-map/issues/18#issuecomment-664415623, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APWUF4AJLX4LAXGYYFJ2PIDR5WCKHANCNFSM4OFFMVMQ. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#issuecomment-664424602, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5EAT2OZMKNNAUZJ62DR5WEERANCNFSM4OFFMVMQ.

fiddleHeads commented 4 years ago

Sounds good. Thanks!


From: rperlin-ela notifications@github.com Sent: July 27, 2020 7:26 AM To: Language-Mapping/language-map Cc: Daurio, Corrie Maya; Mention Subject: Re: [Language-Mapping/language-map] Correctly show endonyms on all platforms (#18)

Ok, we'll do this in the spreadsheet if that will simplify things!

On Jul 27, 2020, at 10:18 AM, fiddleHeads notifications@github.com wrote:

Just to chime in here to say that I've been populating blank/NULL values in the Endonym column with values from the Language column in GIS as part of the process. Let me know if this will be done in the spreadsheet beforehand so I don't build it into my model/script, or I'm happy to do that in the GIS.

I downloaded the spreadsheet yesterday to start processing it.

Thanks,

Maya


From: rperlin-ela notifications@github.com Sent: July 27, 2020 7:03 AM To: Language-Mapping/language-map Cc: Daurio, Corrie Maya; Mention Subject: Re: [Language-Mapping/language-map] Correctly show endonyms on all platforms (#18)

Sounds good. I don't know how we'll handle those ones in MB though. I think it might have to be a conditional thing similar to what I'll use in the code, where it checks for the http prefix. But obviously it can't use an image as a label, so it will have to default to English name.

The links to the images are already in the spreadsheet as of yesterday, including the two vertical Mongolian ones, let me know if you can't hit them or you need anything else. If those four (Avestan, ASL, the two Mongolian) can't appear as labels on the map and it has to default to English names, it's not the end of the world.

I think I fixed it by simply uploading Noto Sans Regular (the font we're using in the app) to Mapbox and using that for the endonyms labels!

Excellent, that did the trick.

So cool! No idea how to handle this though. Questions:

  1. When you say "and some others", how many are we talking?
  2. Have you seen this happen in other non-MB scenarios?
  3. ...and, if so, are you aware of any fonts (Noto or otherwise) that would render it properly? Noto Sans Regular does not seem to fix it in MB:

I can do a survey and let you know. At the moment not seeing the endonym labels and I wonder if any kind of even super-beta filtering function will be ready to push soon, so that I can easily check all the Tibetan ones, for instance.

Would it mess up your data/workflow if you populated the Endonym?

No problem at all to fill in the blanks in the Endonym column for cases where it's the same as the English name. Assuming I understand this correctly.

What about http in Results panel?

Totally safe to say that no one or just about no one will be searching for the image-only ones. Good to flag, but not something to worry about

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Language-Mapping/language-map/issues/18#issuecomment-664415623, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APWUF4AJLX4LAXGYYFJ2PIDR5WCKHANCNFSM4OFFMVMQ. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#issuecomment-664424602, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5EAT2OZMKNNAUZJ62DR5WEERANCNFSM4OFFMVMQ.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Language-Mapping/language-map/issues/18#issuecomment-664428893, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APWUF4BPBDOXO2OZDZEAY53R5WFBDANCNFSM4OFFMVMQ.

abettermap commented 4 years ago

copied from my email just now:

One thing to reiterate about the Dropbox links is that you need to use the correct URL format like you did with the fonts: https://dl.dropboxusercontent.com/s/foh9sz3ire7n9v8/NotoSansSyriacEstrangela-Regular.ttf?dl=0

...as opposed to the dropbox.com. The difference is that the latter will open the file in the Dropbox UI preview thing, while the former will download the file.

I wonder if any kind of even super-beta filtering function will be ready to push soon

I pushed just now and it should build to here but I am struggling with a font issue that I haven't been able to resolve yet. Fonts load fine in the MB Style Draft preview but nowhere else (on a positive note I got the Regions color-coded like the print map, which you'll see in that preview).

Also I don't have the icons wired up in the code yet, so if you switch to Type in the deploy then it'll crap out.

For the Data/Results table:

rperlin-ela commented 4 years ago

One thing to reiterate about the Dropbox links is that you need to use the correct URL format like you did with the fonts: https://dl.dropboxusercontent.com/s/foh9sz3ire7n9v8/NotoSansSyriacEstrangela-Regular.ttf?dl=0

...as opposed to the dropbox.com. The difference is that the latter will open the file in the Dropbox UI preview thing, while the former will download the file.

Got it, done.

I wonder if any kind of even super-beta filtering function will be ready to push soon

I pushed just now and it should build to here but I am struggling with a font issue that I haven't been able to resolve yet. Fonts load fine in the MB Style Draft preview but nowhere else (on a positive note I got the Regions color-coded like the print map, which you'll see in that preview).

Great. Without being able to zoom and see all the labels (assuming I'm doing this right) it's still hard to do a thorough review of the labels, but from what I'm seeing all the Tibetan-script ones are messed up in a similar way. But the good news is that most others, from initial spot checking, seem to be fine. Not sure if you put in Noto yet, but LTR's like ליטװיש יידיש still reading in the wrong direction.

abettermap commented 4 years ago

I put in Noto but then had to remove it. Might be an MB bug but it was causing errors. Will need to be resolved of course, not sure what's going on yet. Tried re-uploading it but it still breaks.

abettermap commented 4 years ago

Just hold off on the Tibetan review until I get it fixed. Will let you know.

abettermap commented 4 years ago

UPDATES:

  1. Going with SVG images instead of fonts for the remaining UI Tofu. The MB fonts are entirely separate from UI fonts, so only one-bird-one-stone there.
  2. The ones in MB that were not rendering before are rendering now thanks to combination of Ross's MB font uploads and Jasons MB styles JSON config.
  3. One remaining issue in the MB/map rendering is RTL languages like Arabic/Hebrew are still LTR (Jason should be able to fix, not a font issue).
  4. ...and the other issue is Tibetan and other south Asian endo's not displaying properly, although they all may be resolved by this one.
  5. If there are tons and tons that need special attention and MB config, Jason proposes a "Font" field in the dataset (this would enable Ross to control/maintain all Endo fonts w/o code updates).
    1. The code would check for that value first
    2. and if found, label in MB using Endonym with the specified font.
    3. If not found, check for http prefix in Endonym.
    4. If found, style in MB using Language but use the image in the UI (e.g. ASL). The "Font" field would allow us to continue using images in the UI, special fonts (if there is one) in MB, and Language as a fallback for all.
  6. Even though MB style from Studio is no longer being hit directly for the symbol styling, Ross's font uploads were still necessary as Jason has not figured out how to load MB fonts locally (as in non-Studio).

Will close this when...

Don't need to wait for SVG for the remaining Tofu's as the infrastructure in the code is already in place, it's just data-dependent now.

abettermap commented 4 years ago

@rperlin-ela some updates (not pushed yet at time of writing):

RTL in MB

Was an easy fix, should be working for Arabic and Hebrew automatically now:

image

Tibetan (all forms)

I uploaded this Noto to MB and they are showing properly now as MB labels:

image

I also used the Tibetan Noto for:

image

Trouble

These guys I couldn't find a Noto font for:

Ongoing

This is not very maintainable or efficient for either of us, it really needs to come from the data as a new column I think, so let's discuss another approach/workflow (although you won't be back in time unfortunately) than the current one, which is upload + edit this:

image

abettermap commented 4 years ago

@rperlin-ela

If you're seeing issues with these, I mistyped a few in #69 but they're correct now on my computer, will push in another branch:

image

Other than that, can we close this monster issue?

rperlin-ela commented 4 years ago

Yes!

On Sep 1, 2020, at 8:04 PM, Jason Lampel notifications@github.com wrote:

 @rperlin-ela

If you're seeing issues with these, I mistyped a few in #69 but they're correct now on my computer, will push in another branch:

Other than that, can we close this monster issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rperlin-ela commented 4 years ago

Sorry — I noticed these few endonyms that were missing, my bad, hopefully the last:

Bumthang, Kurtöp, and Tshangla should also use Noto Tibetan.

There are also a small number where the labels for the endonym looks good on the map, but not in the UI, e.g. Molisan. Is the endonym using Noto there?

On Sep 6, 2020, at 11:16 PM, Jason Lampel notifications@github.com wrote:

Closed #18 https://github.com/Language-Mapping/language-map/issues/18 via #73 https://github.com/Language-Mapping/language-map/pull/73.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#event-3735021746, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5HG6EZ6B6YERHESOYTSERGCRANCNFSM4OFFMVMQ.

abettermap commented 4 years ago

Bumthang, Kurtöp, and Tshangla should also use Noto Tibetan.

No worries, I added those and also ventured to guess that Sharchop was in that party as well. Here's that whole group rendering fine now (locally on my laptop, no push yet):

image

There are also a small number where the labels for the endonym looks good on the map, but not in the UI, e.g. Molisan. Is the endonym using Noto there?

I haven't done anything special for that one, no. It's not showing tofu for me, i see this:

image

Is that not correct? If it isn't, is this (and the others) documented anywhere along with their respective Noto font? I'm not seeing Molisan in our "endos not rendering" sheet nor the "bad-fonts.json" github config (don't worry about updating the GH config anymore, we should actually just use the sheet).

abettermap commented 4 years ago

it occurred to me that you may be asking about the heading font in general. if so then no, they headings are all using Gentium Basic. if i switch to Noto Sans it does look more legit though:

image

I'd rather not change all the headings' font just for these handful because i think the serif looks great, so looks like we're not going to squeak by so easy on the fonts after all! definitely need a list of the ones that don't look correct though.

rperlin-ela commented 3 years ago

Understood about this — I’ve got a handle on it, and there are 14 we’d like to put in Noto Sans, which I’m guessing will do the trick. These are all Latin-based scripts but they use special IPA characters or diacritics that Gentium is not handling well. Sorry not to have noticed this earlier, but hopefully it’s not too bad after what we’ve been through with the non-Latin scripts.

In the Endonyms spreadsheet, this new batch all have “Noto Sans” under Font. Is that cool? Let me know if there’s anything else I can do. If it’s easier than going to the spreadsheet, here’s what should be the complete list:

Adjoukrou Beneventano Cilentano Dinka Ewe Frafra Guarani Igbo Kpelle Molisan Neo-Mandaic Marchigiano Temne Vietnamese

Now that the labels are looking great, fortunately or unfortunately I was also able to check them more comprehensively on the map and there were some that are not showing up. Anyway, sorry I didn’t catch these earlier — they’re all documented in the spreadsheet now but here’s the list if it’s handy.

-Ethiopian labels not appearing (Amharic, Ge’ez, Gurage, Tigre, Tigrinya)

Solution: uploaded Noto Sans Ethiopic to Mapbox, which each of these should point to (see spreadsheet)

-Telugu label a little messed up

Solution: uploaded Noto Sans Telugu, which Telugu should point to

-Balti label not showing up

Solution: uplaoded Noto Sans Arabic, which Balti should point to

-Neo-Aramaic (Chaldean) and Neo-Aramaic (Assyrian) labels not appearing

Solution: point to Noto Sans Syriac Estrangala

-Karen, Kachin, Pa’O labels not appearing

Solution: point to Noto Sans Myanmar

Is this ok?

On Sep 8, 2020, at 11:20 PM, Jason Lampel notifications@github.com wrote:

it occurred to me that you may be asking about the heading font in general. if so then no, they headings are all using Gentium Basic. if i switch to Noto Sans it does look more legit though:

https://user-images.githubusercontent.com/4974087/92549973-c0245e80-f217-11ea-87d1-46728274c088.png I'd rather not change all the headings' font just for these handful because i think the serif looks great, so looks like we're not going to squeak by so easy on the fonts after all! definitely need a list of the ones that don't look correct though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Language-Mapping/language-map/issues/18#issuecomment-689274949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMNKB5DOWJAYOT5RPKPJVTDSE3YBNANCNFSM4OFFMVMQ.