DiTo97 / alphacodings

base26 and base52 encodings
MIT License
1 stars 0 forks source link
encodings natural-language-processing tokenization vocabulary
# alphacodings base26 ([A-Z]) and base52 ([A-Za-z]) encodings

🌟 overview

transform any string to alphabetic-only with base26 ([A-Z]) and base52 ([A-Za-z]) lossless encodings; useful for transmitting textual data over restrictive channels or for training AI models and tokenizers on simpler vocabularies.

alphacodings is a fast and lightweight C++ library; bindings are available via pybind11.

⚙️ installation

python -m pip install alphacodings

🚀 usage

from alphacodings import base26_encode, base26_decode, base52_encode, base52_decode

string = """\
<!DOCTYPE html>
<html>
<head>
    <title>sample page</title>
</head>
<body>
    <h1>welcome!</h1>
    <p>you are reading a sample HTML string.</p>
</body>
</html>
"""

if __name__ == "__main__":
    encoding_base26 = base26_encode(string)
    print(encoding_base26)
    # >>> YBPNLKVNQWZQCMDHMLNDTVQCCRKQLNCFGMQPNGQCIXHUUPHFUNKUFEPDLKIGARFOKTDEZKQHXGCPYHDZKKVIUDNFOAYYAUOQFBJFFGSTKAXNWGDPVUJNBARPNXBASHZBXIBSSEFTAIQRPEADSOVVNXUMQXVDWTAIVCIVWQZAHAGYAVZYKGMETJOOUQNOEXMSOOGSKVMFBYZIBZDAITICYVXMJTTCCHPMSCABLYUMFDUNLVSLNKHSBPKCGASXJSFYDHZFAOEQTUACEBIFKQGYC

    encoding_base52 = base52_encode(string)
    print(encoding_base52)
    # >>> EgcgYRPxckylMQWRLDADNZxPJiJcHaVwYHLnicahBgaotGGANZuvsvcpSSOJFLXvKPjRlNQCJqqdviiIdtnwJyDOnWojsrpkWSTZFHbMIREvREjpsODtSxoLlLjQZOoehsGFzawGQecyuomgpZQNyFnZQLWPiDhzClwxBFCCwdqduGJoshrwFdwHWMtJpSTmjxzaYmNvzOIOwLkJvyQHCaFtrODPhbhBpPBmC

    assert base26_decode(encoding_base26) == string
    assert base52_decode(encoding_base52) == string

🧠 motivation

The library is inspired by R. Heaton's base26 implementation in the pyskyWiFi repository and his story on how to manipulate data transmission in restrictive network channels via alphabetic-only encodings and tokenization.

have a look at the original repository and story blog post and show him some love!

📊 benchmarking

TBC

🤝 contributing

contributions to alphacodings are welcome!

feel free to submit pull requests or open issues on our repository.

📄 license

see the LICENSE file for more details.