AlexanderSenf / ucla_data_processing

An example for data processing
Apache License 2.0
0 stars 0 forks source link

Data Processing test program

This is a test program writtent to extract data from a given text file.

This program was written using:

Setup

It is recommended to set up a virtual environment:

python3 -m venv env
source env/bin/activate

Check out the project and change into the directory:

git clone https://github.com/AlexanderSenf/ucla_data_processing.git
cd ucla_data_processing

Once in the environment, all necessary Python prerequisites can be installed:

pip install -r requirements.txt

This list contains one optional requirement python-Levenshtein, which is used to speed up fuzzy string matching (used in case an unknown product code is encountered). The Linux prerequisites for this include gcc and python3-dev.

In Ubuntu these are installed:

sudo apt-get install gcc python3-dev

Testing the program

Unit tests: python -m unittest

Running the program

The simplest form: python processor/process.py process This runs the program with the provided test data file by default.

Help is displayed: python processor/process.py --help

There are two commands available in the script:

process

There are three parmeters:

The script can automatically correct for errors in the product code, if at most one of the characters is incorrect.

add

There are two parmeters, both are required: