Please provide some background on the proposed additions or changes.
Right now we have initiated a script for merging SCE objects that takes as input a metadata file where each processed library file is a row in the metadata file. The script as it works right now will merge all libraries listed in the metadata into one merged SCE object. However, it's likely that users want to be able to integrate specific groups and don't necessarily want one single integrated object.
What are the changes that you are proposing?
We should allow users to specify which group each library belongs to (e.g. libraries from the same disease are merged together rather than all libraries being merged together into one). To do this we will need to add an additional column to the integration library metadata file where users can indicate which libraries should be grouped together.
Please describe the proposed solution.
This issue will involve two new additions:
Adding a new column, integration_group to the integration metadata to indicate which group each library should be added to.
Minimal modifications to workflow so that we are only merging libraries with the same integration_group value. Right now we read in the list of sce files to include in a merged object. I believe we should be able to continue to do that and add a integration_group argument. Then I would add the integration_group to the metadata of the merged SCE object.
What potential "gotchas" do we know of?
This approach is slightly different than we have in the integration workflow in sc-data-integration, where we read in the metadata file and split the data frame by the integration_group before reading in the sce objects and merging.
I think we should be able to have minimal modifications on the merging script, but grab the list of SCE files for each integration_group listed in the metadata file within snakemake.
Please provide some background on the proposed additions or changes.
Right now we have initiated a script for merging SCE objects that takes as input a metadata file where each processed library file is a row in the metadata file. The script as it works right now will merge all libraries listed in the metadata into one merged SCE object. However, it's likely that users want to be able to integrate specific groups and don't necessarily want one single integrated object.
What are the changes that you are proposing?
We should allow users to specify which group each library belongs to (e.g. libraries from the same disease are merged together rather than all libraries being merged together into one). To do this we will need to add an additional column to the integration library metadata file where users can indicate which libraries should be grouped together.
Please describe the proposed solution.
This issue will involve two new additions:
integration_group
to the integration metadata to indicate which group each library should be added to.integration_group
value. Right now we read in the list of sce files to include in a merged object. I believe we should be able to continue to do that and add aintegration_group
argument. Then I would add theintegration_group
to the metadata of the merged SCE object.What potential "gotchas" do we know of?
This approach is slightly different than we have in the integration workflow in
sc-data-integration
, where we read in the metadata file and split the data frame by theintegration_group
before reading in the sce objects and merging. I think we should be able to have minimal modifications on the merging script, but grab the list of SCE files for eachintegration_group
listed in the metadata file within snakemake.Additional context or questions