Project-OSRM / osrm-text-instructions

Text instructions from OSRM route responses
BSD 2-Clause "Simplified" License
87 stars 61 forks source link

Abbreviations should have explicit case-sensitive or insensitive search option #232

Open yuryleb opened 6 years ago

yuryleb commented 6 years ago

Documentation for new Abbreviations stuff has following note:

The keys are in lowercase, to aid in case-insensitive matching

But Russian street names should be processed case-sensitively so as Russian status names are always in lower-case to distinct them from given names. Also some 'status names' could also be 'given names' in some cases like Набережная улица where first word could mean Embankment status name or just Seafront Street as here.

Please also look at other "multi-status" Russian streets collection prepared by streetmangler project.

Maybe it's better to explicitly add some meta block to abbreviation JSONs similar to other JSON files with a 'case sensitive' : true/false option?

1ec5 commented 6 years ago

Now that we’re using JSON instead of the CSV format from mapbox-navigation-ios, we can introduce more data that allows clients to match words more contextually. Besides case sensitivity, some languages like English would benefit from a property that says whether the word is a prefix or suffix. That way the client doesn’t have to make assumptions about the role of classifications versus directions, for example.

Does the abbreviation customarily retain the same case as the spelled-out word? If so, we could document that the client is expected to preserve the case as it abbreviates a word. The logic would look something like this:

token = token.replace(wordRegExp, function (word) {
  let abbreviatedWord = abbreviations[word];
  if (word != word.toLowerCase()) {
    abbreviatedWord = capitalizeFirstLetter(abbreviatedWord);
  }
  return abbreviatedWord;
});

/cc @danpaz

yuryleb commented 6 years ago

No, I meant Набережная улица should be abbreviated as Набережная ул. not Наб. ул. :wink: But набережная Кутузова should be наб. Кутузова.

That's it's just enough to process abbreviations as is that's case-sensitively.