All names of introduced variables are given as an example and were not checked on compatibility with the whole project code.
Let's get started.
sorted(list(set(list_of_all_pairs_of_nodes)))
If you have long expressions which are repetitive and will be calculated anyway, then it is better to calculate them in advance and store in a variable.
Before
list_of_all_pairs_of_nodes = [tuple(sorted(elem)) for elem in result if elem[0] != elem[1]]
if tuple_format:
return sorted(list(set(list_of_all_pairs_of_nodes)))
else:
return [list(elem) for elem in sorted(list(set(list_of_all_pairs_of_nodes)))]
After
list_of_all_pairs_of_nodes = [tuple(sorted(elem)) for elem in result if elem[0] != elem[1]]
sorted_unique_pairs_of_nodes = sorted(list(set(list_of_all_pairs_of_nodes)))
if tuple_format:
return sorted_unique_pairs_of_nodes
else:
return [list(elem) for elem in sorted_unique_pairs_of_nodes]
if self.normalize:
discrete_pdf[key] = [elem / sum(list(agg_result.values())) for elem in list(agg_result.values())]
else:
discrete_pdf[key] = list(agg_result.values())
After
agg_result_values = list(agg_result.values())
if self.normalize:
sum_of_result_values = sum(agg_result_values)
discrete_pdf[key] = [elem / sum_of_result_values for elem in agg_result_values]
else:
discrete_pdf[key] = agg_result_values
a: float, b: floata and b are given as the attributes of the method, why not to calculate everything possible with them in advance instead of doing it every time in double loop?
Let's say, things like
(-1/(a*b))
# or
(1/a**2)
# or
(1/b**2)
# or
(b/(a+b))
# or
(a+b)
# etc.
Formulas also require some modifications to be easier to read, but that's mentioned in issue #3.
for column in columns:
assert column not in list(Descriptor(self.dataframe).info()['object'].keys())
for column in columns:
# insert your code...
After
check_list = list(Descriptor(self.dataframe).info()['object'].keys())
for column in columns:
assert column not in check_list
for column in columns:
# insert your code...
self._new_data
I would suggest to rename _new_data into _data_is_new or something similar to make it obvious that we are working with boolean.
All names of introduced variables are given as an example and were not checked on compatibility with the whole project code.
Let's get started.
sorted(list(set(list_of_all_pairs_of_nodes)))
If you have long expressions which are repetitive and will be calculated anyway, then it is better to calculate them in advance and store in a variable.Before
After
list(agg_result.values())
Same here, but even more important as we won't calculate it every loop.Before
After
Exactly the same case with another method of the class.
a: float, b: float
a
andb
are given as the attributes of the method, why not to calculate everything possible with them in advance instead of doing it every time in double loop?Let's say, things like
Formulas also require some modifications to be easier to read, but that's mentioned in issue #3.
else: pass
It is either a stub (a very tricky one) or just useless part of code as it is the equivalent of doing nothing.
list(Descriptor(self.dataframe).info()['object'].keys())
And again, let's generate the list once instead of doing it for each column.Before
After
self._new_data
I would suggest to rename_new_data
into_data_is_new
or something similar to make it obvious that we are working with boolean.