benedekrozemberczki / littleballoffur

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
https://little-ball-of-fur.readthedocs.io
GNU General Public License v3.0
702 stars 55 forks source link

random.sample does not work for sets as of Python 3.11 #27

Open mwindoffer opened 11 months ago

mwindoffer commented 11 months ago

As of Python 3.11 random.sample does not auto cast sets to lists. An error is thrown instead. I checked all Samplers and the DiffusionSampler, DiffusionTreeSampler as well as the ForestFireSampler all use random.sample in their _do_a_step function. Leading to this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[39], line 2
      1 test_sampler = lbf.DiffusionSampler()
----> 2 new_graph = test_sampler.sample(graph)
      3 # nx.draw(new_graph)

File E:\Repositories\MT-NBNC\mt_venv\Lib\site-packages\littleballoffur\exploration_sampling\diffusionsampler.py:69, in DiffusionSampler.sample(self, graph, start_node)
     67 self._create_initial_node_set(graph, start_node)
     68 while len(self._sampled_nodes) < self.number_of_nodes:
---> 69     self._do_a_step(graph)
     70 new_graph = self.backend.get_subgraph(graph, list(self._sampled_nodes))
     71 return new_graph

File E:\Repositories\MT-NBNC\mt_venv\Lib\site-packages\littleballoffur\exploration_sampling\diffusionsampler.py:45, in DiffusionSampler._do_a_step(self, graph)
     41 def _do_a_step(self, graph):
     42     """
     43     Doing a single random walk step.
     44     """
---> 45     source_node = random.sample(self._sampled_nodes, 1)[0]
     46     neighbor = self.backend.get_random_neighbor(graph, source_node)
     47     if neighbor not in self._sampled_nodes:

File C:\ProgramData\anaconda3\Lib\random.py:439, in Random.sample(self, population, k, counts)
    415 # Sampling without replacement entails tracking either potential
    416 # selections (the pool) in a list or previous selections in a set.
    417 
   (...)
    435 # too many calls to _randbelow(), making them slower and
    436 # causing them to eat more entropy than necessary.
    438 if not isinstance(population, _Sequence):
--> 439     raise TypeError("Population must be a sequence.  "
    440                     "For dicts or sets, use sorted(d).")
    441 n = len(population)
    442 if counts is not None:

TypeError: Population must be a sequence.  For dicts or sets, use sorted(d).

The suggested solution is to use sorted(). I am unsure if this will impact performance negatively as opposed to just casting to a list.

rjurney commented 9 months ago

@benedekrozemberczki I can fix if you want to give me permissions?