JnBrymn / minglbot

0 stars 0 forks source link

redesign get relations #18

Open JnBrymn opened 9 years ago

JnBrymn commented 9 years ago

currently get_relations takes an arguments users, dir (and others). I was preparing to make an "in between function" https://github.com/JnBrymn/minglbot/issues/4 for situations with more complex relationship involving more groups. But no matter how many groups there are, you can always condense it down to a group that follows the target users and a group that is followed by the target group. Change get_relations to takes a has_followers arg and a has_friends arg, both of which are user groups.

Once this is done modify get_friends and get_followers and add get_both_friends_and_followers and get_either_friends_and_followers.

BUT before implementing this, think hard about it. For instance, can I really implement get_either_friends_and_followers? I would like to also be able to implement things like get_friends_but_not_followers and get_followers_but_not_friends if it's possible. Maybe get_relations needs 4 args must_have_followers, can_have_followers, must_have_friends, can_have_friends. Or maybe get_relations needs 3 arguments has_friends, has_followers, has_friendfollowers. And what about negative groups has_not_friends, has_not_followers.

JnBrymn commented 9 years ago

Here is a useful query

MATCH (x:User) WHERE x.screen_name IN ["vividcortex","solarce","mindweather","velocityconf"] 
WITH collect(id(x)) as x1ids
MATCH (x:User) WHERE x.id in [19734656,14586723]
WITH x1ids+collect(id(x)) as x1ids

MATCH (x:User) WHERE x.screen_name IN ["vividcortex","solarce","mindweather","newrelic"] 
WITH x1ids,collect(id(x)) as x2ids
MATCH (x:User) WHERE x.id in [19734656,14586723]
WITH x1ids,x2ids+collect(id(x)) as x2ids

MATCH (x1)-[:FOLLOWS]->(t),(x2)<-[:FOLLOWS]-(t)
WHERE id(x1) in x1ids AND id(x2) in x2ids
RETURN count(*) as c, t.screen_name,t.id
ORDER BY c DESC
LIMIT 1000

It grabs two groups (could be N groups) and finds matches related to those groups. This gives me hope for matching not clauses, and clauses, or clauses, etc.

I created a Neo4J issue to figure out why simpler queries don't work here: https://github.com/neo4j/neo4j/issues/2834. For instance, this should work

match (x:User)-[:FOLLOWS]->(t:User)-[:FOLLOWS]->(y:User)
where (x.screen_name in ["vividcortex"] or x.id in [19734656])
  and (y.screen_name in ["vividcortex"] or y.id in [19734656])
return t.id
limit 1000

but it doesn't make use of indices.

JnBrymn commented 9 years ago

"Not following" can be queried like this:

MATCH (x:User) WHERE x.screen_name IN  ["NinjaBunny_","AaronBBrown777","itaykahana","tiopaul","MartinLoy","LaineVCampbell","duanegran","jbarciauskas","snoogindoogin","chuckhagenbuch","yournameistoby","blalor","GiantEvolving","cvshumake","insyteful","samkottler","timgoodaire","DynData","egon1024","MathYourLife","j_manero","obfuscurity"] 
WITH collect(id(x)) as x1ids

MATCH (x:User) WHERE x.screen_name IN ["bigg33k","alexsergeyev","akachler","mrtazz","ickymettle","lozzd","allspaw","Ryan_Frantz","mikearpaia","d87tech","techwolf359","wcgallego","jimbartus","dominis","pk11","herczog","balagez","privateblue","mthology","BenRegenspan","MsLaurenRae","emittelhammer","oacgnol","highmountain","isamlambert","dbussink","alindeman","saxenaan","mezis_fr","zvikico","aryanet","interskh","thilorusche"]
WITH x1ids,collect(id(x)) as x2ids

MATCH (x:User) WHERE x.screen_name IN  ["lmcdowell ","dgenzale","djuntgen","productiondba","maximefouilleul","TeeKraken","olafvanzandwijk","JoshuaPrunier","gudlyf","ShayYannay","j8erg","rkuris","ShlomiNoach","mazorE","web007","dbaldwin","tobi","phlipper","skingry","johnduff","fbogsany","honkfestival","TomCallway","RonaldBradford","banpei","bboskoff","John_Cesario","SquareNerd","maniksurtani","MichaelLossos","jeeyoungk","3ameam","abhay","sfermigier","djapg","JeromeThibaud","jsjacob","krudnicki","ValerieNC","huntersatter"]
WITH x1ids,x2ids,collect(id(x)) as x3ids

MATCH (x1)-[:FOLLOWS]->(t),(x2),(x3)
WHERE id(x1) in x1ids AND id(x2) in x2ids AND id(x3) IN x3ids
  AND NOT (x3)-[:FOLLOWS]->(t)
  AND NOT (x2)<-[:FOLLOWS]-(t)
RETURN count(*) as c, t.screen_name,t.id
ORDER BY c DESC
LIMIT 1000;

This takes minutes to run. Thus negative clauses should be held to a minimum.

JnBrymn commented 9 years ago

The counting on the negative queries is off. There is a problem in that the MORE a user isn't followed or friended by a NOT group, the higher their count will be. There need to be some way to not use the number of not follows in the score!

Ideas

JnBrymn commented 9 years ago

Closing in favor of better defined issue here: https://github.com/JnBrymn/minglbot/issues/25