Meaningful-Data / vtlengine

A Validation and Transformation Language engine, written in Python
https://docs.vtlengine.meaningfuldata.eu/
GNU Affero General Public License v3.0
5 stars 0 forks source link
python sdmx vtl vtlengine

VTL Engine

Testing Testing
Package PyPI Latest Release
License License - AGPL 3.0

Introduction

The VTL Engine is a Python library for validating and running VTL scripts.

It is a Python-based library around the VTL Language.

Installation

Requirements

The VTL Engine requires Python 3.10 or higher.

Install with pip

To install the VTL Engine on any Operating System, you can use pip:


pip install vtlengine

Note: it is recommended to install the VTL Engine in a virtual environment.

Usage

The VTL Engine API implements two basic methods:

Any action with VTL requires the following elements as input:

Semantic Analysis

The semantic_analysis method serves to validate the correctness of a VTL script, as well as to calculate the data structures of the datasets generated by the VTL script itself (that calculation is a pre-requisite for the semantic analysis).

Example 1: Correct VTL

from vtlengine import semantic_analysis

script = """
    DS_A := DS_1 * 10;
"""

data_structures = {
    'datasets': [
        {'name': 'DS_1',
         'DataStructure': [
             {'name': 'Id_1',
              'type':
                  'Integer',
              'role': 'Identifier',
              'nullable': False},
             {'name': 'Me_1',
              'type': 'Number',
              'role': 'Measure',
              'nullable': True}
         ]
         }
    ]
}

sa_result = semantic_analysis(script=script, data_structures=data_structures)

print(sa_result)

Returns:

{'DS_A': Dataset(name='DS_A', components={'Id_1': Component(name='Id_1', data_type=<class 'vtlengine.DataTypes.Integer'>, role=<Role.IDENTIFIER: 'Identifier'>, nullable=False), 'Me_1': Component(name='Me_1', data_type=<class 'vtlengine.DataTypes.Number'>, role=<Role.MEASURE: 'Measure'>, nullable=True)}, data=None)}

Example 2: Incorrect VTL

Note that, as compared to Example 1, the only change is that Me_1 is of the String data type, instead of Number.

from vtlengine import semantic_analysis

script = """
    DS_A := DS_1 * 10;
"""

data_structures = {
    'datasets': [
        {'name': 'DS_1',
         'DataStructure': [
             {'name': 'Id_1',
              'type':
                  'Integer',
              'role': 'Identifier',
              'nullable': False},
             {'name': 'Me_1',
              'type': 'String',
              'role': 'Measure',
              'nullable': True}
         ]
         }
    ]
}

sa_result = semantic_analysis(script=script, data_structures=data_structures)

print(sa_result)

Will raise the following Error:

raise SemanticError(code="1-1-1-2",
vtlengine.Exceptions.SemanticError: ('Invalid implicit cast from String and Integer to Number.', '1-1-1-2')

Run VTL Scripts

The run method serves to execute a VTL script with input datapoints.

Returns a dictionary with all the generated Datasets. When the output parameter is set, the engine will write the result of the computation to the output folder, else it will include the data in the dictionary of the computed datasets.

Two validations are performed before running, which can raise errors:

Example 3: Simple run

from vtlengine import run
import pandas as pd

script = """
    DS_A := DS_1 * 10;
"""

data_structures = {
    'datasets': [
        {'name': 'DS_1',
         'DataStructure': [
             {'name': 'Id_1',
              'type':
                  'Integer',
              'role': 'Identifier',
              'nullable': False},
             {'name': 'Me_1',
              'type': 'Number',
              'role': 'Measure',
              'nullable': True}
         ]
         }
    ]
}

data_df = pd.DataFrame(
    {"Id_1": [1, 2, 3],
     "Me_1": [10, 20, 30]})

datapoints = {"DS_1": data_df}

run_result = run(script=script, data_structures=data_structures,
                 datapoints=datapoints)

print(run_result)

returns:

{'DS_A': Dataset(name='DS_A', components={'Id_1': Component(name='Id_1', data_type=<class 'vtlengine.DataTypes.Integer'>, role=<Role.IDENTIFIER: 'Identifier'>, nullable=False), 'Me_1': Component(name='Me_1', data_type=<class 'vtlengine.DataTypes.Number'>, role=<Role.MEASURE: 'Measure'>, nullable=True)}, data=  Id_1   Me_1
0    1  100.0
1    2  200.0
2    3  300.0)}

For more information on usage, please refer to the API documentation.