Speed, name, readability optimization

All names of introduced variables are given as an example and were not checked on compatibility with the whole project code.

Let's get started.

sorted(list(set(list_of_all_pairs_of_nodes))) If you have long expressions which are repetitive and will be calculated anyway, then it is better to calculate them in advance and store in a variable.

Before

list_of_all_pairs_of_nodes = [tuple(sorted(elem)) for elem in result if elem[0] != elem[1]]
if tuple_format:
    return sorted(list(set(list_of_all_pairs_of_nodes)))
else:
    return [list(elem) for elem in sorted(list(set(list_of_all_pairs_of_nodes)))]

After

list_of_all_pairs_of_nodes = [tuple(sorted(elem)) for elem in result if elem[0] != elem[1]]
sorted_unique_pairs_of_nodes = sorted(list(set(list_of_all_pairs_of_nodes)))

if tuple_format:
    return sorted_unique_pairs_of_nodes
else:
    return [list(elem) for elem in sorted_unique_pairs_of_nodes]

list(agg_result.values()) Same here, but even more important as we won't calculate it every loop.

Before

if self.normalize:
    discrete_pdf[key] = [elem / sum(list(agg_result.values())) for elem in list(agg_result.values())]
else:
    discrete_pdf[key] = list(agg_result.values())

After

agg_result_values = list(agg_result.values())
if self.normalize:
    sum_of_result_values = sum(agg_result_values)
    discrete_pdf[key] = [elem / sum_of_result_values for elem in agg_result_values]
else:
    discrete_pdf[key] = agg_result_values

Exactly the same case with another method of the class.

a: float, b: float a and b are given as the attributes of the method, why not to calculate everything possible with them in advance instead of doing it every time in double loop?

Let's say, things like

(-1/(a*b))
# or
(1/a**2)
# or
(1/b**2)
# or
(b/(a+b))
# or
(a+b)
# etc.

Formulas also require some modifications to be easier to read, but that's mentioned in issue #3.

else: pass

else:
    pass

It is either a stub (a very tricky one) or just useless part of code as it is the equivalent of doing nothing.

list(Descriptor(self.dataframe).info()['object'].keys()) And again, let's generate the list once instead of doing it for each column.

Before

for column in columns:
    assert column not in list(Descriptor(self.dataframe).info()['object'].keys())
for column in columns:
# insert your code...

After

check_list = list(Descriptor(self.dataframe).info()['object'].keys())

for column in columns:
    assert column not in check_list
for column in columns:
# insert your code...

self._new_data I would suggest to rename _new_data into _data_is_new or something similar to make it obvious that we are working with boolean.

Cro3SwI2mer / Bradford-Research-Project

Speed, name, readability optimization #4