Islast commented 6 years ago

This is a very long issue to discuss how to change the network_analysis_from_corrmat wrapper, for more modularity and easier adjustment.

Decisions to discuss
Breaking down the main function
The main function verbatim

Decisions

Moving data and functions to class attributes and methods

One major consideration is how much of the data that accompanies the correlation matrix (names, centroids) and how much of the data produced afterwards (partitions, random graphs and measures) can or should be stored as attributes of the graph, graph nodes or edges. Furthermore should some of these functions (nodal partition, calculate measures, the smaller functions wrapped in the 'calculate_nodal_measures' function, write_out_x) be Class methods? Adding class methods to the networkx Graph class seems cavalier. We could define a new class, but if we do it should support all the same methods as a networkx Graph.

Breaking down the main function

Set-up

read in data and initialise graph:

# Read in the data
M, names, centroids = read_in_data(corr_mat_file,
                                    names_file,
                                    centroids_file,
                                    names_308_style)

# Make your graph at cost
G = mkg.graph_at_cost(M, cost)

Nodal measures

calculate nodal partition What we mean by calculate nodal partition is "assign communities to the nodes via the louvain method", and is where we use the community package.

# Calculate the modules
nodal_partition = mkg.calc_nodal_partition(G)

calculate nodal measures We calculate a range of nodal mesures including:

# Get the nodal measures
# (note that this takes a bit of time because the participation coefficient
# takes a while)
G, nodal_dict = mkg.calculate_nodal_measures(G,
                                             centroids,
                                             names,
                                             nodal_partition=nodal_partition,
                                             names_308_style=names_308_style)

write out nodal measures exactly what it says on the tin

# Save your nodal measures
write_out_nodal_measures(nodal_dict, centroids, output_dir, corr_mat_file, cost)

Global measures

generate random graphs generate n random graphs via edge-swapping

R_list, R_nodal_partition_list = mkg.make_random_list(G, n_rand=n_rand)

calculate global measures

global_dict = mkg.calculate_global_measures(G,
                                            R_list=R_list,
                                            nodal_partition=nodal_partition,
                                            R_nodal_partition_list=R_nodal_partition_list)

write out global measures

# Write out the global measures
write_out_global_measures(global_dict, output_dir, corr_mat_file, cost)

Rich Club

calculate rich club coefficients

# Get the rich club coefficients
deg, rc, rc_rand = mkg.rich_club(G, R_list=R_list, n=n_rand)

write out rich club

# Write out the rich club coefficients
write_out_rich_club(deg, rc, rc_rand, output_dir, corr_mat_file, cost)

The main function verbatim

As of Latest commit 81713d6

def network_analysis_from_corrmat(corr_mat_file,
                                  names_file,
                                  centroids_file,
                                  output_dir,
                                  cost=10,
                                  n_rand=1000,
                                  names_308_style=False):
    '''
    This is the big function!
    It reads in the correlation matrix, thresholds it at the given cost
    (incorporating a minimum spanning tree), creates a networkx graph,
    calculates global and nodal measures (including random comparisons
    for the global measures) and writes them out to csv files.
    '''
    # Read in the data
    M, names, centroids = read_in_data(corr_mat_file,
                                        names_file,
                                        centroids_file,
                                        names_308_style)

    # Make your graph at cost
    G = mkg.graph_at_cost(M, cost)

    # Calculate the modules
    nodal_partition = mkg.calc_nodal_partition(G)

    # Get the nodal measures
    # (note that this takes a bit of time because the participation coefficient
    # takes a while)
    G, nodal_dict = mkg.calculate_nodal_measures(G,
                                                 centroids,
                                                 names,
                                                 nodal_partition=nodal_partition,
                                                 names_308_style=names_308_style)

    # Save your nodal measures
    write_out_nodal_measures(nodal_dict, centroids, output_dir, corr_mat_file, cost)

    # Get the global measures
    # (note that this takes a bit of time because you're generating random
    # graphs)
    R_list, R_nodal_partition_list = mkg.make_random_list(G, n_rand=n_rand)

    global_dict = mkg.calculate_global_measures(G,
                                                R_list=R_list,
                                                nodal_partition=nodal_partition,
                                                R_nodal_partition_list=R_nodal_partition_list)

    # Write out the global measures
    write_out_global_measures(global_dict, output_dir, corr_mat_file, cost)

    # Get the rich club coefficients
    deg, rc, rc_rand = mkg.rich_club(G, R_list=R_list, n=n_rand)

    # Write out the rich club coefficients
    write_out_rich_club(deg, rc, rc_rand, output_dir, corr_mat_file, cost)

KirstieJane commented 6 years ago

Woooo! This is a REALLY useful issue!! Thank you @Islast! ✨

I'm happy to discuss any refactoring that's needed. I have to admit that I don't really know the benefits of having class methods 😬. The most important thing for me with this refactoring is that the values are written out in a standard form (csv for example) so that the other commands can be run without running this one.

Islast commented 6 years ago

The value of having something saved as a class attribute instead of as a global namespace variable or in a dictionary is that they are easier to identify. i.e if you want to find the nodal partition you calculated for a given graph G, you only need to type G.nodal_partition. The value of class methods is that grammatically you expect nodal_partition(G) to return a nodal partition, while you expect G.nodal_partition() to return nothing and make changes to G. This means that if we want the nodal partition to be saved automatically as an class attribute, G.nodal_partition() is the better choice. I think it's good to provide functions in terms of both: normal functions for when you just want to return the result, and class methods for when you want to automatically add to the graph attributes. BUT, I don't know how adding new class methods to an existing class interferes with it (adding attributes is fine).

resources for this more

Islast commented 6 years ago

@Kirstie, let me know where and when and what you would like to see written to disk :hibiscus:

Islast commented 5 years ago

Completed in #74

WhitakerLab / scona

Refactor the network_analysis_from_corrmat wrapper #33

Decisions

Moving data and functions to class attributes and methods

Breaking down the main function

Set-up

Nodal measures

Global measures

Rich Club

The main function verbatim