Open BradKML opened 3 years ago
This function can be borrowed easily with a simple import queue
class Frontier():
""" Objects of this class keep track of the Breadth-First Search
Frontier as it is expanded.
"""
def __init__(self, src, expander):
self.internal = set()
self.perimeter = set()
# The BFS queue contains the same elements as self.perimeter
# but in bfs order, i.e. elements closer to source are nearer
# to the beginning of the queue.
self.queue = queue.Queue()
self.perimeter.add(src)
self.queue.put_nowait(src)
# some metadata to be able to trace back the path from a node
# to the source node
self.metadata = {}
self.metadata[src] = {"distance": 0, "parent": None}
# This function is called on a node when we need to get the
# outgoing edges. It is going to be different for the source
# and destination nodes in our social graph. Returns a set.
self.expander = expander
def expand_perimeter(self):
""" Expands the pperimeter by picking up the node at the front
of the queue and expanding it to its followers/friends depending
upon the definition of this.expander function.
Returns the new elements found outside the periphery.
"""
# User at the front of the breadth-first queue is made internal
u = self.queue.get_nowait()
self.internal.add(u)
self.perimeter.remove(u)
# let's move the frontier forward at the node 'u'
new_nodes = self.expander(u).difference(self.internal, self.perimeter)
print(".", end="")
sys.stdout.flush()
# Keep track of distance of a node from src and its parent
d = self.metadata[u]["distance"]
for n in new_nodes:
self.metadata[n] = {"distance": d + 1, "parent": u}
self.perimeter.update(new_nodes)
list(map(self.queue.put_nowait, new_nodes))
return set(new_nodes)
def is_on_perimeter(self, user_id):
""" Tells whether user_id is in on the perimeter of this bfs frontier.
"""
return (user_id in self.perimeter)
def covered_all(self):
""" True if we have covered all the graph, i.e.
The whole graph is now internal. There remain no nodes on
the periphery.
"""
return bool(self.perimeter)
def get_parent(self, n):
""" Returns the parent for node n.
Will throw KeyError when we haven't seen the node before.
But in this application, we don't expect that to happen. So, if it
happens, something is really messed up.
"""
return self.metadata[n]["parent"]
def get_distance(self, n):
"""" Returns the distance of node `n` from the source.
"""
return self.metadata[n]["distance"]
The Frontier object only takes a seed note and a lambda function (of the scraper) as the main component.
I can imagine variable = Frontier(seed_user, lambda x: get_users_following(users=x, env=env_path, verbose=0, headless=True, wait=2, limit=50, file_path=None))
being the main command that connects to it, however this will cause duplicate calls.
To warn me of the issue: There needs to be a way to cache and restore instances with this Object. Another problem: Collective Pulling if given a name list https://github.com/narendraj9/digsep/issues/2
As demonstrated in the repos below, recursive graph mining of Twitter (following AKA friends) is doable under API
However, to do this without API may be tricky, as there needs to be a breadth-first search function that prevents repeated scraping. As demonstrated in the bottom links, three degrees of following/friends is enough to analyze the community.