IBM / Multi-GNN

Multi-GNN architectures for Anti-Money Laundering.
Apache License 2.0
38 stars 16 forks source link

Multi-GNN

This repository contains all models and adaptations needed to run Multi-GNN for Anti-Money Laundering. The repository consists of four Graph Neural Network model classes (GIN, GAT, PNA, RGCN) and the below-described model adaptations utilized for financial crime detection in Egressy et al.. Note that this repository solely focuses on the Anti-Money Laundering use case. This repository has been created for experiments in Provably Powerful Graph Neural Networks for Directed Multigraphs [AAAI 2024] and Realistic Synthetic Financial Transactions for Anti-Money Laundering Models [NeurIPS 2023].

Setup

To use the repository, you first need to install the conda environment via

conda env create -f env.yml

Then, the data needed for the experiments can be found on Kaggle. To use this data with the provided training scripts, you first need to perform a pre-processing step for the downloaded transaction files (e.g. HI-Small_Trans.csv):

python format_kaggle_files.py /path/to/kaggle-files/HI-Small_Trans.csv

Make sure to change the filepaths in the data_config.json file. The aml_data path should be changed to wherever you stored the formatted_transactions.csv file generated by the pre-processing step.

Usage

To run the experiments you need to run the main.py function and specify any arguments you want to use. There are two required arguments, namely --data and --model. For the --data argument, make sure you store the different datasets in different folders. Then, specify the folder name, e.g --data Small_HI. The --model parameter should be set to any of the model classed that are available, i.e. to one of --model [gin, gat, rgcn, pna]. Thus, to run a standard GNN, you need to run, e.g.:

python main.py --data Small_HI --model gin

Then you can add different adaptations to the models by selecting the respective arguments from:

| Argument | Description | | -------------- | ---------------------------- | | `--emlps` | Edge updates via MLPs | | `--reverse_mp` | Reverse Message Passing | | `--ego` | Ego ID's to the center nodes | | `--ports` | Port Numberings for edges |

Thus, to run Multi-GIN with edge updates, you would run the following command:

python main.py --data Small_HI --model gin --emlps --reverse_mp --ego --ports

Additional functionalities

There are several arguments that can be set for additional functionality. Here's a list with them:

| Argument | Description | | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------| | `--tqdm` | Displays a progress bar during training and inference. | | `--save_model` | Saves the best model to the specified `model_to_save` path in the `data_config.json` file. Requires argment `--unique_name` to be specified. | | `--finetune` | Loads a previously trained model (with name given by `--unique_name` and stored in `model_to_load` path in the `data_config.json`) to be finetuned. | | `--inference` | Loads a previously trained model (with name given by `--unique_name` and stored in `model_to_load` path in the `data_config.json`) to do inference only. |

Licence

Apache License Version 2.0, January 2004