EL-BID / UrbanTrips

An open-source library to process smart card payment data, infer destination and get meaningful KPI
https://el-bid.github.io/UrbanTrips/
Other
24 stars 2 forks source link

Structure GPS data into supply schema #51

Closed alephcero closed 1 year ago

alephcero commented 1 year ago

A supply schema should take GPS data from vehicles and route geoms from #50 and classify GPS tracking points into:

Eventually trx data could be classified into these services.

alephcero commented 1 year ago

Service information should be inferred based on GPS data. The user should provide in the config the following information for the GPS table, with the service_type_gps indicating the column containing the information for service change and the value indicating a service started likeso:

nombres_variables_gps = {
    'id_gps': 'DTSN',
    'id_linea_gps': 'IDLINEA',
    'id_ramal_gps': 'C_LD_ID',
    'interno_gps': 'INTERNO',
    'fecha_gps': 'DATE_TIME',
    'latitud_gps': 'LATITUDE',
    'longitud_gps': 'LONGITUDE',
    'direction_gps':'DIRECTION',
    'service_type_gps':
        {
            'TYPE':{'start_service':7, 'finish_service':8}
        },
    'velocity_gps':'VELOCITY',
    'cum_distance':'DISTANCE'

Also, a new parameter to know if that attribute is to be trusted or within that services, there could be more services within (because the bus goes back and forth over possible the routes without closing or opening new services:

confiar_service_type_gps: False

If columns in the GPS add a dict type, renombrar_columnas_tablas() should reflect this:

def renombrar_columnas_tablas(df, nombres_variables, postfijo):
    """
    Esta funcion toma un df, un dict con nombres de variables a ser
    reemplazados y un postfijo que identifica las variables
    del modelo de datos de la app y cambia los nombres y reindexa
    con los atributos de interes de la app. Aquellos atributos que no
    tengan equivalente en nombres_variables apareceran con NULL
    """

    service_id_dict_rename_col = {}
    # if service id column provided as dict:
    if 'service_type_gps' in nombres_variables:
        service_id_col = (
            isinstance(nombres_variables['service_type_gps'], dict) &
            (nombres_variables['service_type_gps'] is not None)
        )

        if service_id_col:
            # get service id data
            service_id_dict = nombres_variables.pop('service_type_gps')
            # get the name in the original df
            service_id_col_name = list(service_id_dict.keys())[0]
            # create a rename dict
            service_id_dict_rename_col = {
                service_id_col_name: 'service_type_gps'}

            # create a replace values dict
            service_id_values = {v: k for k,
                                 v in service_id_dict[service_id_col_name]
                                 .items()}
            df[service_id_col_name] = df[service_id_col_name].replace(
                service_id_values)
            print(service_id_values)

            # remove all values besides start and end of service
            not_service_id_values = ~df[service_id_col_name].isin(
                ['start_service', 'finish_service'])

            df.loc[not_service_id_values, service_id_col_name] = None

    renombrar_columnas = {v: k for k, v in nombres_variables.items()}
    renombrar_columnas.update(service_id_dict_rename_col)

    print("Renombrando columnas:", renombrar_columnas)

    df = df.rename(columns=renombrar_columnas)
    df = df.reindex(columns=renombrar_columnas.values())
    df.columns = df.columns.map(lambda s: s.replace(postfijo, ""))

    return df
alephcero commented 1 year ago

Closed by https://github.com/EL-BID/UrbanTrips/commit/44601ad5afd2287d6e2e3c532778c5cbe2068b1c