AlexsLemonade / scpca-nf

scpca-nf is the Nextflow workflow for processing Single-cell Pediatric Cancer Atlas Portal data
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

Specify merging only specific runs from a project #708

Closed allyhawkins closed 6 months ago

allyhawkins commented 6 months ago

Closes #705

This PR adds the capability to specify runs, libraries, and/or samples to include when creating a merged object for a ScPCA project. First, we filter to any libraries that are in a specific project and then I filter to either include all runs or specify based on a parameter.

I did choose to add a merge_run_ids parameter here rather than use the run_ids parameter set for the main workflow. The reason is because I noticed we have a default list of test runs in the ccdl profile, rather than using the All setting. So if you don't use the flag for run_ids then it will only include the test runs we specify in the profile. We would need to remember to always include --run_ids All when running the merge workflow, and I don't think we want to have to do that every single time. I can see scenarios of us forgetting this. So now we have a separate merge_run_ids param that by default is set to All both in the ccdl profile and the main nextflow config.

Within this PR I also added how to run with specific run ids to the internal instructions, but is this something we also want to include in external instructions? It seems like a very specific need and is not a required parameter, but I can add it if we think it's necessary.