Please provide documentation for using the plugin with the CNRL example dataset

erkkikeranen commented 2 years ago

Regarding http://www.cnrl.colostate.edu/Projects/RAD/pings.html which hosts a dataset that can be used with this plugin, could you please provide

instructions on the CSV-import scripts that construct a correct graph - it has to be guessed by the importer
examples on Cypher queries that create correct query graph examples which are compatible with the dataset and the library returning correct results

I am having issues (only returning empty lists) for any value of similarityscore for the dataset, even if I am able to construct a graph out of the CSV.

Shashika commented 2 years ago

@erkkikeranen thanks for using our library. You can find some sample insert queries here. In the query graph, we have to add a specific label (eg:Query) for all nodes. I will soon upload the specific query to create the query graph for this particular dataset.

erkkikeranen commented 2 years ago

In case you can verify that I am trying to import and correctly use the synthetic radicalization dataset, here's my scripts what I am trying to use.


// clear data
MATCH (n) DETACH DELETE n;

// Load Person nodes 
LOAD CSV WITH HEADERS FROM 'file:///user_rad.csv' AS row
MERGE (p:Person {personId: row.ID, name: row.Name})
RETURN count(p);
// Load Social Media Account Nodes
LOAD CSV WITH HEADERS FROM 'file:///smaccount_rad.csv' AS row
MERGE (s:SmAccount {smAccountId: row.ID, type: row.Type})
RETURN count(s);
// Load Social Media Posts Nodes
LOAD CSV WITH HEADERS FROM 'file:///postnodes_rad.csv' AS row
MERGE (p:Post {postId: row.ID, type: row.Type})
RETURN count(p);
// Load Activity Nodes
LOAD CSV WITH HEADERS FROM 'file:///activity_rad.csv' AS row
MERGE (a:Activity {activityId: row.ID, name: row.Name, type: row.Type})
RETURN count(a);
// Load relationship Person-EXHIBITS->activity 
LOAD CSV WITH HEADERS FROM 'file:///exhibits_rad.csv' AS row
MATCH (p:Person {personId: row.UserID})
MATCH (a:Activity {activityId: row.ActivityID})
MERGE (p)-[:EXHIBITS {timestamp: toInteger(row.Timestamp)}]->(a)
RETURN *;
// Load relationship Person-HAS->Social media account
LOAD CSV WITH HEADERS FROM 'file:///has_rad.csv' AS row
MATCH (p:Person {personId: row.UserID})
MATCH (s:SmAccount {smAccountId: row.SMID})
MERGE (p)-[:HAS]->(s)
RETURN *;
// Load relationship Person1-KNOWS->Person2
LOAD CSV WITH HEADERS FROM 'file:///knows_rad.csv' AS row
MATCH (p1:Person {personId: row.UserID1})
MATCH (p2:Person {personId: row.UserID2})
MERGE (p1)-[:KNOWS {timestamp: toInteger(row.Timestamp)}]->(p2)
RETURN *;
// Load relationship Social media account->POSTS->Post
LOAD CSV WITH HEADERS FROM 'file:///posts_rad.csv' AS row
MATCH (s:SmAccount {smAccountId: row.SMID})
MATCH (p:Post {postId: row.PostID})
MERGE (s)-[:POSTS {timestamp: toInteger(row.TimeStamp)}]->(p)
RETURN *;

which results in a graph like (small part shown):

And then I try to create a query graph:

MATCH (q:q1) DETACH DELETE q;

CREATE (:q1:Person {name: 'U57'});

MATCH (q:q1:Person)
MERGE (q)-[:EXHIBITS]->(:q1:Activity {name: 'Referred Radicalized Materials'})
MERGE (q)-[:EXHIBITS]->(:q1:Activity {name: 'Received Training'})
MERGE (q)-[:EXHIBITS]->(:q1:Activity {name: 'Detonated a bomb'})
MERGE (q)-[:EXHIBITS]->(:q1:Activity {name: 'Carried an attack'})
MERGE (q)-[:EXHIBITS]->(:q1:Activity {name: 'Purchase Weapons'})
MERGE (q)-[:EXHIBITS]->(:q1:Activity {name: 'Suspicious Travel'})
MERGE (q)-[:HAS]->(:q1:SmAccount {type: 'Facebook'})
RETURN null;

MATCH (s:q1:SmAccount {type: 'Facebook'})
MERGE (s)-[:POSTS]->(:q1:Post {type: 'Extremist_Ngram'})
RETURN null;

Which creates following graph: match (q:q1)-[r]-(m) return q, r, m

This query graph, that was described as an example in your paper I have been trying to reconstruct.

But it does not matter what float values I give to the query CALL cnrl.similarityMeasure(1, 1, 'q1', 'Person'), it does not return a single node (empty array is always the result).

the rel2neo examples was nice, but they don't directly help me on the path to get PINGS to work with a dataset, i.e. the radicalization data set.

Just let me know if there's anything I can try out or if I have missed something, or if there is more documentation available I would be glad to know 😃

cnrl-csu / pings

Please provide documentation for using the plugin with the CNRL example dataset #1