arthurdejong / python-stdnum

A Python library to provide functions to handle, parse and validate standard numbers.
https://arthurdejong.org/python-stdnum/
GNU Lesser General Public License v2.1
501 stars 210 forks source link

Refactor for standard way of accessing submodules internatinally in autmated way #53

Closed blaggacao closed 6 years ago

blaggacao commented 7 years ago

I'm working on an implementation to automatically detect the right check let's say for a tax number check in alll current (and future) supported countries, given that I have knowledge of the country code.

I would do therefore

try:
    import stdnum.COUNTRY_CODE
except:
    pass

validated = stdnum.COUNTRY_CODE.vat.validate(foo)

While I can trust on existence of the vat attribute (or a corresponding alias) in some modules, I cannot in others like ca (not imported in __init__.py). This reduces the potentialization of stdnum in automated scripts. And I really don't want to go to the filesystem level in order to detect the presence of some implementation. :wink:

It would be great to be able to ensure by convention at least presence of the following aliases:

What do you think?

blaggacao commented 7 years ago

What I'm doing:

>>> import stdnum.co
>>> import stdnum.ca
>>> stdnum.co.vat.validate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: validate() takes exactly 1 argument (0 given)
>>> stdnum.ca.vat.validate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'vat'
>>> stdnum.ca.bn.validate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'bn'

As you see some are imported and available as attributes some are not. It's not utterly consistent.

blaggacao commented 7 years ago

@gustavovalverde and me probably come from the same community :)

blaggacao commented 7 years ago

@qdp-odoo could have interest in this as well...

arthurdejong commented 7 years ago

There is a get_cc_module() utility function that could be helpful in this case. The utils package is mostly for internal use at this point but we could make it public and documented. See https://github.com/arthurdejong/python-stdnum/blob/master/stdnum/util.py#L27

blaggacao commented 7 years ago

Nice, thanks for the hint, I'll use this helper then. Nonwithstanding, being able to trust a certain convention, like:

Would make this feature perfect... I would not need to encode, if the vat number is called this or that in this or that country...

arthurdejong commented 7 years ago

Some formats are already classified in that way. Conventions that are already in use:

The problem with this scheme is that some countries may have more than one business identifier for different purposes. In some countries the tax (vat) number may also be a general identifier, in others there are different numbers for different taxes and yet another for example for the chamber of commerce. Personal numbers usually have similar problems (though generally to a lesser degree).

Whether a specific number is usable in an application that uses python-stdnum becomes application-specific quickly. The VAT number is the easy case usually (especially in the EU and countries with a similar tax structure) and should already work like that.

Regarding passport numbers: I had a quick look at passport numbers for one or two countries at a certain point but this is really hard because every few years a new passport is issued with a new number format. Also, actual format (apart from the length and alphabet) are generally not very well published. There are probably check digits in some passport numbers but no algorithm was published (or even hinted at) in the numbers that I had a look at.

However, patches are always welcome (both for adding classifications and for passport numbers)!

I will think about exposing get_cc_module() in the public API and making it easier to use.

blaggacao commented 7 years ago

Thanks for you overview, it seems like

in others there are different numbers for different taxes

is the actual deal breaker...No way to get this one clean and completely.

I'm now trying to do a reference of the stdnum alias as per country metadata... So application specific, but at least not code...

Let me see, if I find some energy to construct some artful regexs to do a quick PR on the classification topic.

Yes, please include get_cc_module(), tis is definitely a very clean interface for consumers...

gustavovalverde commented 7 years ago

@blaggacao you sometimes read my mind, now I'm almost certain.

@arthurdejong, for the implementation we talked for Dominican Republic, for example, we have vat and personalid. And now that you mention it, we also have a businessid, which gives me a great idea, but this one could be added later.

arthurdejong commented 6 years ago

Release 1.8 has just been made available which adds get_cc_module() to the toplevel namespace. It is also documented https://arthurdejong.org/python-stdnum/doc/1.8/index#stdnum.get_cc_module