dhimmel / learn

Machine learning and feature extraction for the Rephetio project
https://doi.org/10.15363/thinklab.d210
4 stars 5 forks source link

Prior probability of treatment implementation question #2

Closed veleritas closed 7 years ago

veleritas commented 7 years ago

Hi Daniel,

Do you mind explaining what cell 9 in 1-prior.ipynb is doing exactly? Code below:

%%time

# Initialize a dictionary of degree to empirical probability list
degree_to_probs = {x: list() for x in degree_to_edges}

# Perform n_perm permutations
for i in range(n_perm):
    # Permute
    pair_list, stats = permute_pair_list(pair_list, multiplier=multiplier, seed=i)

    # Update
    pair_set = set(pair_list)
    for degree, probs in degree_to_probs.items():
        edges = degree_to_edges[degree]
        probs.append(len(edges & pair_set) / len(edges))

As far as I can tell, the inner loop does not actually do anything, since probs is a variable defined in the for loop, and gets overwritten each time. Also, the edges variable gets overwritten in later cells, so I could not determine what this cell is doing. Since it takes so long to compute, I was wondering if it was even necessary.

veleritas commented 7 years ago

Ah, I see that it's updating the degree_to_probs dictionary in the inner loop...

dhimmel commented 7 years ago

In that notebook, probs is a reference to a list that gets appended to in the inner loop.

The outer loop permutes the bipartite network. The inner loop finds, for each degree tuple, what percentage of possible edges actually existed in the permuted network.

Make sense?