This is a test program writtent to extract data from a given text file.
This program was written using:
It is recommended to set up a virtual environment:
python3 -m venv env
source env/bin/activate
Check out the project and change into the directory:
git clone https://github.com/AlexanderSenf/ucla_data_processing.git
cd ucla_data_processing
Once in the environment, all necessary Python prerequisites can be installed:
pip install -r requirements.txt
This list contains one optional requirement python-Levenshtein
, which is used
to speed up fuzzy string matching (used in case an unknown product code is
encountered). The Linux prerequisites for this include gcc
and python3-dev
.
In Ubuntu these are installed:
sudo apt-get install gcc python3-dev
Unit tests: python -m unittest
The simplest form: python processor/process.py process
This runs the program with the provided test data file by default.
Help is displayed: python processor/process.py --help
There are two commands available in the script:
process
is used to process an input file.add
is used to add a product code to the list of recognized codes.There are three parmeters:
--filename
can be used to specify alternate input files.--productcode
can be used to specify a specific product code.
Specifying a code displays all subtypes for that code in the purchase.--uniqueids
is a flag, which is False by default. If it is
set, all unique IDs are displayed for each product code in the purchase.The script can automatically correct for errors in the product code, if at most one of the characters is incorrect.
There are two parmeters, both are required:
--productcode
specifes the new 4-letter product code.--description
specifes the product description.