arthurdejong / python-stdnum

A Python library to provide functions to handle, parse and validate standard numbers.
https://arthurdejong.org/python-stdnum/
GNU Lesser General Public License v2.1
484 stars 203 forks source link

Adding support for Slovenian Corporate Registration number (matična številka) #414

Closed bblaz closed 11 months ago

arthurdejong commented 1 year ago

I'm assuming the copyright year is meant to be 2023. :wink:

Reading http://www.pisrs.si/Pis.web/pregledPredpisa?id=URED7599 (through Google Translate), part of article 10 reads:

if there are more than 999 parts of the business entity, the three-digit serial number is determined by the letter in the eighth place and the serial number in the last two places;

Which suggests that only the 8th digit can be a letter and the last two digits need to be a number. The current implementation doesn't check the last three digits at all, we should at least check that they the last three digits are somewhat reasonable (e.g. I wouldn't expect 9331310$$$ to validate).

Then there is the issue of the check digit calcuation when the remaineder is (again via Google Translate) part of article 11:

If the remainder of the division is 1 (one), the check digit is 0 (zero). If the remainder of the division is 0 (zero), that number is eliminated from the set of sequence numbers to determine the parent number.

This suggests (but could be a translation misunderstanding) that if total % 11 is 0 the checksum is invalid.

Do you have a way to extend the list of test numbers for the tests? It would be really useful to have more test numbers, particularly for validating that the corner cases for the check digit algorithm are correct (see https://github.com/arthurdejong/python-stdnum/blob/master/CONTRIBUTING.md#finding-test-numbers).

bblaz commented 1 year ago

I'm assuming the copyright year is meant to be 2023. wink

Your are corect! :)

Reading http://www.pisrs.si/Pis.web/pregledPredpisa?id=URED7599 (through Google Translate), part of article 10 reads:

if there are more than 999 parts of the business entity, the three-digit serial number is determined by the letter in the eighth place and the serial number in the last two places;

Which suggests that only the 8th digit can be a letter and the last two digits need to be a number. The current implementation doesn't check the last three digits at all, we should at least check that they the last three digits are somewhat reasonable (e.g. I wouldn't expect 9331310$$$ to validate).

I've added a regex to check for that. I actually searched for any such case with the big companies in the business registry (AJPES - link below), however found no such case.

Then there is the issue of the check digit calcuation when the remaineder is (again via Google Translate) part of article 11:

If the remainder of the division is 1 (one), the check digit is 0 (zero). If the remainder of the division is 0 (zero), that number is eliminated from the set of sequence numbers to determine the parent number.

This suggests (but could be a translation misunderstanding) that if total % 11 is 0 the checksum is invalid.

I believe the translation should be correct. I will calculate some and add to the test in the next commit, as no such numbers were published I don't think I included any in the test and might have missed. I'll do it in the next couple of days if time will allow. But I agree, check should include this.

Do you have a way to extend the list of test numbers for the tests? It would be really useful to have more test numbers, particularly for validating that the corner cases for the check digit algorithm are correct (see https://github.com/arthurdejong/python-stdnum/blob/master/CONTRIBUTING.md#finding-test-numbers).

I added more numbers to the test cases. All ever used numbers are published on: https://www.ajpes.si/prs/Default.asp?language=english

bblaz commented 1 year ago

Checksum function corrected

arthurdejong commented 11 months ago

Hi @bblaz,

I've merged your changes as d0f4c1a with only minimal changes (mostly moving a few small things around, simplifying the check digit calculation and removing the format function).

Thanks for your contribution.