faker-ruby / faker

A library for generating fake data such as names, addresses, and phone numbers.
MIT License
11.27k stars 3.18k forks source link

Consistent addresses #2881

Open jaredbeck opened 10 months ago

jaredbeck commented 10 months ago

Is your feature request related to a problem? Please describe it.

It is difficult to generate an address meeting the basic standards of consistency. For example, we may want the state and state_abbr to match.

If you're adding new objects, please describe how you would use them Provide examples of how the proposed feature could be useful and relevant. For example, if proposing a new generator, explain why it's useful and relevant to Faker, and examples of how to use it in a real project.

Many systems perform basic validation of addresses. It would be convenient if Faker::Address could pass basic validation. I'd suggest that it is outside the scope of Faker to pass advanced validation, e.g. geocoding or postal-service validation.

Describe alternatives you've considered

I tried using full_address_as_hash,

Faker::Address.full_address_as_hash(:street_address, :city, :country, :country_code, :state, :state_abbr, :zip_code, :longitude, :latitude, :country_name_to_code, country_name_to_code: {name: 'united_states'})
#=> {:street_address=>"9158 Gerhold Track", :city=>"North Kelly", :country=>"Albania", :country_code=>"BQ", :state=>"Montana", :state_abbr=>"NC", :zip_code=>"38817-0927", :longitude=>138.93577082675893, :latitude=>83.71634642320669, :country_name_to_code=>"US"}
irb(main):014:0> 

We can see multiple inconsistencies in the above. For example, "US" does not match "Albania".

I am not proposing to change the behavior of .full_address_as_hash, it is just an example of an alternative I considered.

It is possible for users to build their own "consistency layer" on top of Faker, but is not convenient.

Additional context

I will be happy to attempt a PR, if this feature is approved.

For backwards compatibility, this new feature would be an addition, with no breaking changes.

I've deliberately avoided suggesting a new API, as I don't have any opinions about the names of e.g. new methods. I am just seeking feature approval at this point.

jaredbeck commented 10 months ago

There is a precedent of existing features which help to generate consistent addresses:

We could approach this problem by adding additional features like these, one at a time. For example, state_by_abbr. Or, we could attempt something more grandiose, like a Faker::Address.consistent method. The former could be built first, to support the latter. I'm open to any combination of these approaches.

jaredbeck commented 10 months ago

I am still happy to attempt a PR, if this feature is approved. Thank you in advance for your feedback.

thdaraujo commented 10 months ago

Hi, @jaredbeck ! Thanks for the in-depth description of the problem.

I agree that it would be nice to have basic consistency: the state should be located in the right country, and their names and abbreviations should match.

The problems that I see are:

  1. address locales are not organized in a hierarchy, so the abbreviation and the state name are not currently linked.
  2. I think some locales don't have enough information to support this feature (no state abbreviations in FR locale, for example).

Problem (1) can be solved by adding more information to locales, or reorganizing the data. Problem (2) is a bit harder to solve.

For this feature to work, would you need to pull information (state and abbreviation) from a different locale, depending on the country being generated?

And how would you approach problem 1 and 2?

Another option is to start small and just make sure that (country x country code) match. Then figure out if (state x state_abbr) is also possible. What do you think?

BTW: we are not accepting new generators and new locales at the moment, so this work should be limited to changing the existing generators and updating the existing locales.