OpenCageData / address-formatting

templates to format geographic addresses
MIT License
394 stars 84 forks source link

Add oneline templates #103

Closed Zverik closed 3 months ago

Zverik commented 10 months ago

So this is a bit hard to describe, but the goal for making this new file was simple: to have a good one-line address built from parts.

I tried using the original template library, but it doesn't help in multiple ways:

So I have made this new file, by hand. It doesn't need a Mustache parser, and it's more structured. Basically you get the most likely value for 5 parts: local house name, street address, city part, city name, and region. From that, you can, for example, take the first non-empty value for a short address, or make up an algorithm to build a longer line.

Also having country the last part of the region keys makes it easier to deal with disputed territories, since the country never comes up there, shadowed by a region name or something else.

Topics to discuss:

I am fine with dropping this pull request if it doesn't match the repo's purpose. Just wanted to share the thing I spent some days on for my job :)

freyfogle commented 10 months ago

hmm, let me have a longer think and look but my very initial reaction is

will have a longer look early next week

Zverik commented 10 months ago

Okay I've become a bit rusty in perl, but rewrote the validation into it. That covers your last item.

Regarding the first one: this is address formatting. Just a different one. For starters, I needed to print "road, house number" or "house number, road" according to what's common to the country. That's why I got to this repository in the first place.

Turns out it works, but only when you need a full address printed. If you need a part (e.g. a street address only), you have to either do a complex string processing, or... I don't know, do the same thing I did.

Each country has a different set of fields and their priorities for each value, and a different order for addresses. Some reference city_district, some don't have city parts at all, some mention state in regions, and some use island instead. It was all in the original templates, just unstructured.

With the original templates, you got one option: print a full address. Which works if you're targeting the entire world. If you are printing an address line for a local, who knows which country and state they are in, you might want to trim the line. You cannot do that with the original templates, but can — with the structured lists in the new file.

freyfogle commented 10 months ago

yes, the use case makes sense and is clear, and you are right, the current set-up is not good for that.

Still I wonder is there a way not to duplicate much of the logic of worldwide.yaml in oneline.yaml, otherwise we will get out of sync (your change to the entry for DO being a great example).

Thanks for switching to perl!

Zverik commented 9 months ago

What do you think of merging the new file into the old one? E.g.

# Switzerland
CH:
    address_template: |
        {{{attention}}}
        {{{house}}}
        {{{road}}} {{{house_number}}}
        {{{postcode}}} {{#first}} {{{postal_city}}} || {{{town}}} || {{{city}}} || {{{municipality}}} || {{{village}}} || {{{hamlet}}} || {{{county}}} || {{{state}}} {{/first}}
        {{{country}}}
    parts:
        - [house]
        - [road, house_number]
        - []
        - [postal_city, town, city, municipality, village, hamlet, county, state]
        - [country]
    replace:
        - ["Verwaltungskreis",""]
        - ["Verwaltungsregion",""]
        - [" administrative district",""]
        - [" administrative region",""]

I think virtually all replace parts were preseved, and postformat_replace are not used for parts.

(Also I've been wondering if I should keep [attention, house] instead of just [house], albeit the meaning is somewhat different.)

freyfogle commented 9 months ago

Hi Ilya,

sorry for the delay, very busy week.

That would be a good solution to keep things all in one place which is definitely an improvement over having multiple files. I guess my only slight hesitation is having two different styles of templating. What I like about mustache (besides it being a well-known commonly used format with parsers in most/all programming languages) is the logic is clear in the template. In your case the logic is in your processing code, which makes reuse across languages more difficult.

re: [attention, house] I guess it depends on your use case. In general my thinking was to keep the templates comprehensive and if you don't want a specific value then make sure it is not set when calling the formatter.

Zverik commented 3 months ago

So it's been half a year, and nobody chimed in, and also I'm now not sure this should even go into the core. So I'm closing this.