broadinstitute / lincs-profiling-complementarity

Analyzing and comparing signal found in different profiling technologies
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Added consensus moa analysis files #1

Closed AdeboyeML closed 3 years ago

AdeboyeML commented 3 years ago

@gway I have added the jupyter notebook that contains the code for the analysis on the consensus datasets to get the moa sizes/values per doses(1-7), I have also included the results of the analysis i.e. the heatmap plots and the csv files (folders).

gwaybio commented 3 years ago

this looks great @AdeboyeML - a couple of immediate things to change, before I take a deeper dive.

  1. Let's move this entire analysis into a folder called 1.data-exploration.
  2. Let's be sure to keep data extensions on csv files. For example, moa_sizes_consensus_datasets/consensus_median_moa_sizes should be moa_sizes_consensus_datasets/consensus_median_moa_sizes.csv.

The reason for 1 is that we want to perform several analyses in this repo, and this is just one component. Each new analysis will belong in an analysis submodule.

AdeboyeML commented 3 years ago

@gwaygenomics Thanks for the observations.

I have created the 1.Data-exploration folder (putting all files and sub-folders into the folder), and also updated the notebook with the null distribution analysis and I have also saved all the dataframe outputs as .csv files.

The .csv files has both the median_scores and p_values dataframes for each consensus dataset per dose (for each of the 231 MOAs with more than one compound).

gwaybio commented 3 years ago

Looks great @AdeboyeML - sorry this has taken me so long to review. I am just about ready to approve.

Two major things:

  1. It looks like there is a large number of cells in the notebook that are not executed. Can you either execute the cell or delete the cells? Jupyter notebooks need to be executed sequentially, and if I am viewing a notebook without sequential execution, I have extreme doubts about the data analysis. (not saying that I doubt this one, just in general it is suboptimal practice)
  2. Can you add in the figures you generated in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-723839481? They look great, and are not currently captured in this PR. I would also switch to histograms from density plots (in the seaborn jointplot) since the density tails extend far beyond p < 0 (which doesn't make sense).

Other pull request advice

Advice for myself

shntnu commented 3 years ago

Dropping in with a minor comment: Please add the column name for the first column (moa) to the CSV files e.g. https://raw.githubusercontent.com/broadinstitute/lincs-profiling-comparison/9bc5db8167674e2c8bec5cee3fcc043117acfbf6/1.Data-exploration/moa_sizes_consensus_datasets/median_dmso_null_p_values.csv

gwaybio commented 3 years ago

@AdeboyeML - we should prioritize merging in this contribution early next week. Here is a todo list:

I see people forgetting to add column names for indices all the time, and it bugs me! It is also a very easy fix:

data_moa_values = data_moa_values.reset_index().rename({"index": "moa"}, axis="columns")
save_to_csv(data_moa_values, 'moa_sizes_consensus_datasets', 'modz_moa_median_scores.csv')
AdeboyeML commented 3 years ago

@gwaygenomics - thanks for the observations and major points highlighted.

gwaybio commented 3 years ago

once you address these three things:

Go ahead and merge.

Nice work! 🎉

AdeboyeML commented 3 years ago

@gwaygenomics all highlighted points have been addressed and corrected.

gwaybio commented 3 years ago

LGTM 👍 please merge at will

AdeboyeML commented 3 years ago

@gwaygenomics I do not think I have the permission to merge the pull request.

Only those with write access to this repository can merge pull requests.

gwaybio commented 3 years ago

ah, right. Just made you admin. You need to accept in an email, then you should be able to merge. Thanks for pointing this out!