clemente-lab / mmeds-meta

A database for storing and analyzing omics data
https://mmeds.org
2 stars 1 forks source link

Incorporate alpha/beta median/centroid calculations into analysis pipeline #414

Open adamcantor22 opened 2 years ago

adamcantor22 commented 2 years ago

Is your feature request related to a problem? Please describe. For the MECONIUM project I created a script called calculate_centroid.py. This script takes in a mapping file, and either a beta pcoa txt file or an alpha vector text file. It then calculates the centroid (beta) or median (alpha) of the metric, and adds a new column to the mapping file that contains distances for each sample to that centroid/median. This is valuable to our analyses, and should be an option inside of MMEDS.

Describe the solution you'd like Add this script to the repo and decide how it should be integrated into the pipeline. The script itself is rather spaghetti-code, as its scope was actively expanding as I was writing it. So it needs to be cleaned up and given more explanation in comments.

Additional context The script is in minerva at /sc/arion/projects/MMEDS/mmeds_server_data/studies/MECONIUM_RERUN_ALL_STUDIES/calculate_centroid.py

cleme commented 2 years ago

Does it include a parameter to decide how the centroid is calculated? I.e. centroid of all samples, or centroid(s) of specific groups. Additionally, consider if this is a MMEDS script or sth else (anapi).

adamcantor22 commented 2 years ago

Yes it does, there is an option for per-category, which does exactly that. That's how we got the individual centroids calculated per meconium batch. Matt was suggesting it be an anapi script, it for sure could be.