diffpy / diffpy.nmf_mapping

The code for the nmfMapping App on difpy-cmi
Other
3 stars 3 forks source link

Refactor - use functions to generate outputs in `main` #24

Open bobleesj opened 3 weeks ago

bobleesj commented 3 weeks ago

Problem

In main, 4 figures and .json/.csv are generated below:

    fig1 = nmf.component_plot(df_components, args1.xrd, args1.x_units, args1.show)
    fig2 = nmf.component_ratio_plot(df_component_weight_timeseries, args1.show)
    fig3 = nmf.reconstruction_error_plot(df_reconstruction_error, args1.show)
    if args1.pca_thresh:
        fig4 = nmf.explained_variance_plot(df_explained_var_ratio, args1.show)

    if args1.save_files:
        if not os.path.exists(os.path.join(os.getcwd(), "nmf_result")):
            os.mkdir(os.path.join(os.getcwd(), "nmf_result"))
        output_fn = datetime.fromtimestamp(time.time()).strftime("%Y%m%d%H%M%S%f")
        df_components.to_json(os.path.join(os.getcwd(), "nmf_result", "x_index_vs_y_col_components.json"))
        df_component_weight_timeseries.to_json(
            os.path.join(os.getcwd(), "nmf_result", "component_index_vs_pratio_col.json")
        )
        df_component_weight_timeseries.to_csv(
            os.path.join(os.getcwd(), "nmf_result", output_fn + "component_row_pratio_col.txt"),
            header=None,
            index=False,
            sep=" ",
            mode="a",
        )
        df_reconstruction_error.to_json(
            os.path.join(os.getcwd(), "nmf_result", "component_index_vs_RE_value.json")
        )
        plot_file1 = os.path.join(os.getcwd(), "nmf_result", output_fn + "comp_plot.png")
        plot_file2 = os.path.join(os.getcwd(), "nmf_result", output_fn + "ratio_plot.png")
        plot_file3 = os.path.join(os.getcwd(), "nmf_result", output_fn + "loss_plot.png")
        if args1.pca_thresh:
            plot_file7 = os.path.join(os.getcwd(), "nmf_result", output_fn + "pca_var_plot.png")
        plot_file4 = os.path.splitext(plot_file1)[0] + ".pdf"
        plot_file5 = os.path.splitext(plot_file2)[0] + ".pdf"
        plot_file6 = os.path.splitext(plot_file3)[0] + ".pdf"
        if args1.pca_thresh:
            plot_file8 = os.path.splitext(plot_file7)[0] + ".pdf"
        txt_file = os.path.join(os.getcwd(), "nmf_result", output_fn + "_meta" + ".txt")
        with open(txt_file, "w+") as fi:
            fi.write("NMF Analysis\n\n")
            fi.write(f"{len(df_component_weight_timeseries.columns)} files uploaded for analysis.\n\n")
            fi.write(f"The selected active r ranges are:  {args1.xrange} \n\n")
            fi.write("Thesholding:\n")
            fi.write(f"\tThe input component threshold was: {args1.threshold}\n")
            fi.write(f"\tThe input improvement threshold was: {args1.improve_thresh}\n")
            fi.write(f"\tThe input # of iterations to run was: {args1.n_iter}\n")
            fi.write(f"\tWas PCA thresholding used?: {args1.pca_thresh}\n")
            fi.write(f"{len(df_components.columns)} components were extracted")

        fig1.savefig(plot_file1)
        fig2.savefig(plot_file2)
        fig3.savefig(plot_file3)
        if args1.pca_thresh:
            fig4.savefig(plot_file7)
        fig1.savefig(plot_file4)
        fig2.savefig(plot_file5)
        fig3.savefig(plot_file6)

Solution

The code needs to be refactored - make variables name more explicit. Ex) plot_file1 - plot_file_4 which is a .pdf plot of plot_file1.

sbillinge commented 3 weeks ago

also, break it out into functions that are called by main.