AlexsLemonade / scpca-downstream-analyses

This repository is intended to store our pipeline for marker genes analysis.
0 stars 0 forks source link

Data Integration: Allow for merging of SCEs by integration group #340

Closed allyhawkins closed 1 year ago

allyhawkins commented 1 year ago

Please provide some background on the proposed additions or changes.

Right now we have initiated a script for merging SCE objects that takes as input a metadata file where each processed library file is a row in the metadata file. The script as it works right now will merge all libraries listed in the metadata into one merged SCE object. However, it's likely that users want to be able to integrate specific groups and don't necessarily want one single integrated object.

What are the changes that you are proposing?

We should allow users to specify which group each library belongs to (e.g. libraries from the same disease are merged together rather than all libraries being merged together into one). To do this we will need to add an additional column to the integration library metadata file where users can indicate which libraries should be grouped together.

Please describe the proposed solution.

This issue will involve two new additions:

  1. Adding a new column, integration_group to the integration metadata to indicate which group each library should be added to.
  2. Minimal modifications to workflow so that we are only merging libraries with the same integration_group value. Right now we read in the list of sce files to include in a merged object. I believe we should be able to continue to do that and add a integration_group argument. Then I would add the integration_group to the metadata of the merged SCE object.

What potential "gotchas" do we know of?

This approach is slightly different than we have in the integration workflow in sc-data-integration, where we read in the metadata file and split the data frame by the integration_group before reading in the sce objects and merging. I think we should be able to have minimal modifications on the merging script, but grab the list of SCE files for each integration_group listed in the metadata file within snakemake.

Additional context or questions

cbethell commented 1 year ago

Addressed by #342