FUSE-TEST-Data-Gen is an internal system for generating and exporting dummy financial data which reflects a realistic client data set. Its intended use case is generation of bulk data in various formats (xml, csv, json, etc) for testing purposes, or use within POCs.
To run the generator locally, you are required to clone the repo:
Then install:
pip install -r requirements.txt
)A domain object is an entity that represents a real-world business object, such as a position or a trade.
Each domain object is created by a specific factory class in the domainobjectfactories
package. Each factory class extends Creatable
. If you wish to add a new domain object you need to create a new python module containing a single class which extends Creatable
.
A document listing all the current domain objects and their component fields can be found in the Appendices of the requirements document.
A file builder is a self contained piece of functionality which, given a dataset, will build a file according to a specified data format and output that file to a specified location.
Each file builder is represented by a single Python module containing a single Python class, these modules reside in the filebuilders
package. Each class extends the abstract class FileBuilder
, which defines an abstract method build
. Initial file builders are JSON, CSV and XML. If you wish to add a new file builder, simply create a new python module inside the filebuilders
package containing a single class which extends the FileBuilder
abstract class and implements the abstract method build
. The build
method should accept a list of dictionaries (one dictionary per domain object) and use that dataset to generate a file.
Running data generation locally requires a configuration file be provided. Provision of a configuration file is done either via command-line argument specifying its file path, or by creating it in the default location: ‘src/config.json’.
Define a configuration file as per above, or another, known, location. The default configuration is located in src/config.json
. Run the following command from the top-level directory of the repository:
python src/app.py (optional: --config <config_path.json>)
Where no configuration argument is given, the program defaults to the path ‘src/config.json’
Define a configuration file located as per the default location or configure project run-time arguments to point to a configuration file located elsewhere.
Execution relies on two JSON config files. Default versions are provided, although you can replace them with new files with your own specific config if required.
This file contains configuration that would usually only be set by a developer and we wouldn’t expect an end-user to need to update this file. It contains two sections:
config.json contains the configuration stating which objects should be created, how many of them there will be and what data should be included in those objects.
Each object created needs its own key. Each key should map to the following items:
One of the requirements was for users to be able to provide parameters to describe “the shape and volume of data you want to generate”. In order to do this we decided to allow users to include dummy fields in the objects generated. These dummy fields allow users to increase the number of fields generated for each record and specify the type of those fields.
E.g. the following entry in the config will mean that when instruments are generated each instrument will contain all the fields described in the data model along with 3 dummy fields containing alphanumeric strings 10 characters long and 5 dummy fields each containing a 12-digit number
The default Google Drive folder id in the config is “1xTc_fiiIoNxrmHFgviJR1FxlUtdgXSSv“ which points to a folder accessible to anyone within Galatea. The folder is called “FUSE-Test-Data-Gen-Uploads” and is accessible here.
When files are uploaded to Google drive, they will be uploaded into a folder with the current UTC time (HHMMSS) as the name with a parent folder with today's date (YYYY-MM-DD) as the name. If the folder doesn't exist for today, it will be created.
When uploading to Google Drive for the first time, you will be required to login using your Galatea Google account, a browser window should automatically load to allow you do this. Once you have done this, an authentication token file "token.pickle" will be downloaded onto your machine. When running the service remotely, it is important to ensure that a valid token.pickle file exists in the same directory as the application.
The Galatea Jenkins server can be found at: https://jenkins.fuse.galatea-associates.com/. The FUSE-Test-Data-Gen job is the job for this project.
The Jenkinsfile defines a pipeline of the stages Jenkins will perform, the order in which to perform them, the commands required to execute them, as well as any options and environment details. The current stages in this project are:
Runs pip install -r requirements.txt
to install the necessary plugins to the virtual environment.
Executes the tests as supplied in the tests/unit/
directory. Names of test files must be preceeded with test_
The discovery method is set within the configuration of the job itself rather than any external file. It is currently set to scan the repo & run once daily if not otherwise executed. Branches are automatically detected if they contain a Jenkinsfile.
Dependencies arise where objects leverage information from previously generated objects. Usually this takes the form of one object referring to an identifier from another object (much like a primary key <-> foreign key relationship) e.g. The Trade objects refer to Account IDs from Account objects and ISINs from Instrument objects. All the dependencies can be seen in the Data Model tables in the Requirements document.
Cross-object consistency means care must be taken when generating some objects to ensure its requirements have been generated as well e.g. if you with to generate Trade objects you must also generate Account and Instrument objects.
Domain Objects | Dependencies |
---|---|
Account | |
Instrument | |
Price | Instrument |
Cash Balance | Account |
Back Office Position | Account, Instrument |
Depot Position | Account, Instrument |
Front Office Position | Account, Instrument |
Settlement Instruction | Account, Instrument |
Trade | Account, Instrument |
Generation output is done on a per-object basis. As per the configuration, each object has an amount to generate, a maximum file size to adhere to, and a format. Where the number to generate exceeds the maximum file size, multiple files are generated. The file naming convention is sequential, for instance: instrument_000.json, instrument_001.json, and so on.