allo-media / text2num

Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.
https://text2num.readthedocs.io
MIT License
102 stars 47 forks source link

Polish brand #92

Open Jeremy1980 opened 1 year ago

Jeremy1980 commented 1 year ago

Add support for Polish letters. Please polish.zip

rtxm commented 1 year ago

What's in the zip?

Jeremy1980 commented 1 year ago

I can not make pull requests. So I send to you ready to use python code as archive.

rtxm commented 1 year ago

Anybody can make a PR to this repo, there is no restriction. Can I help you?

For security reasons, I don't download and extract zip files from the wild.

Jeremy1980 commented 1 year ago

I marked as useful this repository. So write support for Polish letters. Here is the code:


from typing import Dict, Optional, Set, Tuple
from .base import Language

MULTIPLIERS = {
    "tysiąc": 1_000,
    "tysięcy": 1_000,
    "million": 1_000_000,
    "millionów": 1_000_000,
    "billion": 1_000_000_000,
    "billionów": 1_000_000_000,
    "trillion": 1_000_000_000_000,
    "trillionów": 1_000_000_000_000,
}

UNITS: Dict[str, int] = {
    word: value
    for value, word in enumerate(
        "jeden dwa trzy cztery pięć sześć siedem osiem dziewięć".split(), 1
    )
}

STENS: Dict[str, int] = {
    word: value
    for value, word in enumerate(
        "dziesięć jedenaście dwanaście trzynaście czternaście piętnaście szesnaście siedemnaście osiemnaście dziewiętnaście".split(),
        10,
    )
}

MTENS: Dict[str, int] = {
    word: value * 10
    for value, word in enumerate(
        "dwadzieścia trzydzieści czterdzieści pięćdziesiąt sześćdziesiąt siedemdziesiąt osiemdziesiąt dziewięćdziesiąt".split(), 2
    )
}

MTENS_WSTENS: Set[str] = set()
HUNDRED = {"sto": 100, "setki": 100}

COMPOSITES: Dict[str, int] = {}

NUMBERS = MULTIPLIERS.copy()
NUMBERS.update(UNITS)
NUMBERS.update(STENS)
NUMBERS.update(MTENS)
NUMBERS.update(HUNDRED)
NUMBERS.update(COMPOSITES)

class Polish(Language):

    MULTIPLIERS = MULTIPLIERS
    UNITS = UNITS
    STENS = STENS
    MTENS = MTENS
    MTENS_WSTENS = MTENS_WSTENS
    HUNDRED = HUNDRED
    NUMBERS = NUMBERS

    SIGN = {"plus": "+", "minus": "-"}
    ZERO = {"zero", "o"}
    DECIMAL_SEP = "przecinek"
    DECIMAL_SYM = ","

    AND_NUMS: Set[str] = set()
    AND = "oraz"
    NEVER_IF_ALONE = {"jeden"}

    RELAXED: Dict[str, Tuple[str, str]] = {}

    def normalize(self, word: str) -> str:
        return word

This is basic version. So... you can adapt this to society needs.

Jeremy1980 commented 1 year ago

Anybody can make a PR to this repo, there is no restriction. Can I help you?

For security reasons, I don't download and extract zip files from the wild.

When i type: git push origin polish-brand. I get: Permission to allo-media/text2num.git denied to Jeremy1980. The requested URL returned error: 403

Jeremy1980 commented 1 year ago

git request-pull origin/master https://github.com/allo-media/text2num.git

warn: No match for commit 96e0f6df5c130d76a447f9caf69a7bc875b79d86 found at https://github.com/allo-media/text2num.git warn: Are you sure you pushed 'HEAD' there? The following changes since commit 03165958242c33b2770cde1701a39ba3436b8103:

Merge pull request #90 from Kakadus/ordinal-detection (2023-05-29 09:34:16 +0200)

are available in the Git repository at:

https://github.com/allo-media/text2num.git

for you to fetch changes up to 96e0f6df5c130d76a447f9caf69a7bc875b79d86:

Add support for Polish letters (2023-06-21 12:57:14 +0200)

rtxm commented 1 year ago

You should fork the project first, then send your pull request from there. See https://docs.github.com/en/get-started/quickstart/contributing-to-projects

Jeremy1980 commented 1 year ago

Hey, pull request ready to merge.