UoB-DSMP-2023-24 / dsmp-2024-group-36

dsmp-2024-group-36 created by GitHub Classroom
1 stars 0 forks source link

Try to use neo4j to visualize all transfer users and records #5

Open noobwei opened 7 months ago

noobwei commented 7 months ago

Summary

Code in Neo4j workspace is shown below.

LOAD CSV WITH HEADERS FROM 'https://gist.githubusercontent.com/noobwei/e4c454756666860e2f26626ea470e58b/raw/37149a7407d48cc83cd48d141a8c7e1612d8808e/fake_transactional_data_24.csv' AS row
MERGE (fromAccount:Account {account_id: row.from_totally_fake_account})
MERGE (toAccount:Account {account_id: row.to_randomly_generated_account})
CREATE (fromAccount)-[:TRANSFERRED {
    amount: toFloat(row.monopoly_money_amount),
    date: row.not_happened_yet_date
}]->(toAccount);

The renderings of the preliminary operation are as follows:

Image

Image

It can be initially seen that the places where transactions are more intensive are the accounts of some stores, which also proves that the graph database solution has certain feasibility.

Image

Image

Challenge

  1. Neo4j free users have limitations and display details are limited
  2. The amount of data is too large, and data filtering needs to be streamlined to display graph databases that meet the conditions.
  3. Transaction type cannot be further analyzed

Todo

Continue to learn how to operate graph databases and discuss the feasibility in group meetings

noobwei commented 7 months ago

The parts that can be displayed are limited and the time series is not clear Failure to add filtering conditions will cause serious lags Filter conditions can be added through SQL-like operation statements

MATCH (a:Account)-[r:TRANSFERRED]->(b:Account)
WHERE r.amount < 2
RETURN a, r, b;

Image

MATCH (a:Account)-[r:TRANSFERRED]->(b:Account)
WHERE r.amount > 400
RETURN a, r, b;

Image

When I select the option to increase the consumption amount, I can see that the small amount of consumption is concentrated in coffee, etc., while the large amount of consumption is concentrated in personal transfers. If we use algorithms such as decision trees to use amounts and other attributes to determine the categories of various transfers, using neo4j for visualization will be very intuitive.

noobwei commented 7 months ago

4 These two tasks can be performed jointly

noobwei commented 4 months ago

Neo4j offers strong advantages as a tool for banks to analyze user transaction records, especially when processing and analyzing complex transaction networks. It is a high-performance graph database management system designed to handle large amounts of data. Using a graphical structure to store data, Neo4j makes it easy to capture the transactional relationships between accounts, enabling banks to visualize their customers' financial activities and patterns. At the same time, the use of Neo4j can be partially desensitized to the data, allowing analysts to focus on analyzing the abstracted relationships without touching the privacy of specific users, which is very important for the protection of user privacy.

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/noobwei/data/main/cleaned_simulated_transaction_2024.csv' AS row
WITH row,
     toString(toInteger(row.`Account No`)) AS AccountNo,
     toString(toInteger(row.`Third Party Account No`)) AS ThirdPartyAccountNo,
     row.`Third Party Name` AS StoreName
WHERE row.`Amount` IS NOT NULL
MERGE (a:Account {accountNo: COALESCE(AccountNo, StoreName)})
ON CREATE SET a.initialBalance = toFloat(row.Balance)
MERGE (b:Account {accountNo: COALESCE(ThirdPartyAccountNo, StoreName)})
ON CREATE SET b.initialBalance = toFloat(row.Balance)
CREATE (a)-[t:TRANSFERRED {
    amount: toFloat(row.Amount),
    date: row.Date,
    time: row.Timestamp
}]->(b)

Use the above command to import the csv stored in github into the neo4j

First specify a specific date and time period to view the transactions that occurred

MATCH (a:Account)-[t:TRANSFERRED]->(b:Account)
WHERE t.date = '10/03/2023' AND t.time >= '00:00' AND t.time < '01:00'
RETURN a,t,b

Image

And then we're gonna broaden the scope a little bit and look at a full day's worth of transactions.

MATCH (a:Account)-[t:TRANSFERRED]->(b:Account)
WHERE t.date = '10/03/2023'
RETURN a,t,b

Image

Due to the size of the picture, we can't see the name of the store clearly. So we took a look at the most popular stores section.

Image

Because of the ease of the graph database, it is entirely possible to look at Tesco's turnover over the course of a day in another figure.

MATCH (a:Account)-[t:TRANSFERRED]->(b:Account)
WHERE t.date = '10/03/2023' AND b.accountNo = 'Tesco'
RETURN a, t, b

Image

Each data value can be used as a filter for the situation we need to For example, we need to know where the riches went to spend his money on March 10th, we set a threshold then:

MATCH (a:Account)-[t:TRANSFERRED]->(b:Account)
WHERE a.initialBalance>2000 AND t.date = '10/03/2023'
RETURN a, t, b

Image

Or I'd like to know how Easter spending differs from other days (sadly the data I got this time didn't have Christmas transactions)

MATCH (a:Account)-[t:TRANSFERRED]->(b:Account)
WHERE t.date = '10/04/2023'
RETURN a,t,b

Image

At the same time, due to the limitations of the community board, some transactions cannot be displayed when there are many transactions.

Image

This tool is very useful for banks in practice, and it is perfectly possible for banks to use neo4j APIs to integrate with other applications. For example, if the transaction data has a tag of whether or not it's fraudulent, it can be combined with a graph to see which stores have consumer populations that are subject to fraud, and where the money flows after they've been defrauded. Or very visually spot the increased appeal of specific stores based on the movement of the graph on different days, such as Easter (of course, Tesco will always be the strongest in this data)

noobwei commented 4 months ago

For someone who never used SQL-like language before:

Query Purpose Cypher Query Code Description
View transactions for a specific date and time cypher MATCH (a:Account)-[t:TRANSFERRED]->(b:Account) WHERE t.date = 'DATE' AND t.time BETWEEN 'START_TIME' AND 'END_TIME' RETURN a, t, b Retrieves all transactions for a specified date and time interval.
View all transactions on a specific date cypher MATCH (a:Account)-[t:TRANSFERRED]->(b:Account) WHERE t.date = 'DATE' RETURN a, t, b Retrieves all transactions that occurred on a specified date.
View transactions for a specific vendor on a date cypher MATCH (a:Account)-[t:TRANSFERRED]->(b:Account) WHERE t.date = 'DATE' AND b.accountNo = 'VENDOR_ID' RETURN a, t, b Retrieves all transactions for a specific vendor on a given date.
View transactions above a threshold balance cypher MATCH (a:Account)-[t:TRANSFERRED]->(b:Account) WHERE a.initialBalance > THRESHOLD AND t.date = 'DATE' RETURN a, t, b Retrieves transactions on a specified date for accounts with a balance above a certain threshold.
Compare spending on different dates cypher MATCH (a:Account)-[t:TRANSFERRED]->(b:Account) WHERE t.date IN ['DATE1', 'DATE2'] RETURN a, t, b Compares transaction activity between two or more specific dates.

In this table, placeholders such as 'DATE', 'START_TIME', 'END_TIME', 'VENDOR_ID', 'THRESHOLD', 'DATE1', and 'DATE2' are used. You should replace these with the actual values according to the data and query requirements.