Schiphol-Hub / ssim

IATA Standard Schedules Information Manual file format parser
GNU General Public License v3.0
49 stars 26 forks source link

unable to get cyrillic text ssim file to work #15

Open northdesk opened 3 years ago

northdesk commented 3 years ago

АД.txt

rok commented 3 years ago

It appears we have a bug in regexs application. A temporary workaround would be:

import logging
import ssim
from ssim.ssim import (regexes, _parse_sir, _parse_sim, _flatten,
                       _uniformize_sim_as_sir, _uniformize_sir, _uniformize_sim)

def read(file, iata_airport=None):
    with open(file, "r") as f:
        text = f.read()

    slots = []

    if regexes["sir"]["header"].search(text):
        logging.info("Reading and parsing SIR file: %s." % file)
        slots = _parse_sir(text)
        slots = _flatten([_uniformize_sir(x) for x in slots])

    elif regexes["sim"]["record_1"].search(text):
        if iata_airport:
            logging.info("Reading and parsing SIM file: %s." % file)
            slots = _parse_sim(text)
            slots = _flatten([_uniformize_sim_as_sir(x, iata_airport) for x in slots])

        else:
            logging.info("Reading and parsing SIM file: %s." % file)
            slots = _parse_sim(text)
            slots = _flatten([_uniformize_sim(x) for x in slots])

    return slots

ssim.read = read  # Here we overwrite ssim buggy function

l = ssim.read("АД.txt")