WhitakerLab / scona

Code to analyse structural covariance brain networks using python.
https://whitakerlab.github.io/scona/
MIT License
66 stars 33 forks source link

Refactor the network_analysis_from_corrmat wrapper #33

Open Islast opened 6 years ago

Islast commented 6 years ago

This is a very long issue to discuss how to change the network_analysis_from_corrmat wrapper, for more modularity and easier adjustment.

Decisions

Moving data and functions to class attributes and methods

One major consideration is how much of the data that accompanies the correlation matrix (names, centroids) and how much of the data produced afterwards (partitions, random graphs and measures) can or should be stored as attributes of the graph, graph nodes or edges. Furthermore should some of these functions (nodal partition, calculate measures, the smaller functions wrapped in the 'calculate_nodal_measures' function, write_out_x) be Class methods? Adding class methods to the networkx Graph class seems cavalier. We could define a new class, but if we do it should support all the same methods as a networkx Graph.

Breaking down the main function

The main function verbatim

As of Latest commit 81713d6

def network_analysis_from_corrmat(corr_mat_file,
                                  names_file,
                                  centroids_file,
                                  output_dir,
                                  cost=10,
                                  n_rand=1000,
                                  names_308_style=False):
    '''
    This is the big function!
    It reads in the correlation matrix, thresholds it at the given cost
    (incorporating a minimum spanning tree), creates a networkx graph,
    calculates global and nodal measures (including random comparisons
    for the global measures) and writes them out to csv files.
    '''
    # Read in the data
    M, names, centroids = read_in_data(corr_mat_file,
                                        names_file,
                                        centroids_file,
                                        names_308_style)

    # Make your graph at cost
    G = mkg.graph_at_cost(M, cost)

    # Calculate the modules
    nodal_partition = mkg.calc_nodal_partition(G)

    # Get the nodal measures
    # (note that this takes a bit of time because the participation coefficient
    # takes a while)
    G, nodal_dict = mkg.calculate_nodal_measures(G,
                                                 centroids,
                                                 names,
                                                 nodal_partition=nodal_partition,
                                                 names_308_style=names_308_style)

    # Save your nodal measures
    write_out_nodal_measures(nodal_dict, centroids, output_dir, corr_mat_file, cost)

    # Get the global measures
    # (note that this takes a bit of time because you're generating random
    # graphs)
    R_list, R_nodal_partition_list = mkg.make_random_list(G, n_rand=n_rand)

    global_dict = mkg.calculate_global_measures(G,
                                                R_list=R_list,
                                                nodal_partition=nodal_partition,
                                                R_nodal_partition_list=R_nodal_partition_list)

    # Write out the global measures
    write_out_global_measures(global_dict, output_dir, corr_mat_file, cost)

    # Get the rich club coefficients
    deg, rc, rc_rand = mkg.rich_club(G, R_list=R_list, n=n_rand)

    # Write out the rich club coefficients
    write_out_rich_club(deg, rc, rc_rand, output_dir, corr_mat_file, cost)
KirstieJane commented 6 years ago

Woooo! This is a REALLY useful issue!! Thank you @Islast! ✨

I'm happy to discuss any refactoring that's needed. I have to admit that I don't really know the benefits of having class methods 😬. The most important thing for me with this refactoring is that the values are written out in a standard form (csv for example) so that the other commands can be run without running this one.

Islast commented 6 years ago

The value of having something saved as a class attribute instead of as a global namespace variable or in a dictionary is that they are easier to identify. i.e if you want to find the nodal partition you calculated for a given graph G, you only need to type G.nodal_partition. The value of class methods is that grammatically you expect nodal_partition(G) to return a nodal partition, while you expect G.nodal_partition() to return nothing and make changes to G. This means that if we want the nodal partition to be saved automatically as an class attribute, G.nodal_partition() is the better choice. I think it's good to provide functions in terms of both: normal functions for when you just want to return the result, and class methods for when you want to automatically add to the graph attributes. BUT, I don't know how adding new class methods to an existing class interferes with it (adding attributes is fine).

resources for this more

Islast commented 6 years ago

@Kirstie, let me know where and when and what you would like to see written to disk :hibiscus:

Islast commented 5 years ago

Completed in #74