AlexsLemonade / alsf-scpca

Management and analysis tools for ALSF Single-cell Pediatric Cancer Atlas data.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Make genetic-demux into a full workflow #160

Closed jashapiro closed 2 years ago

jashapiro commented 2 years ago

This PR creates a genetic-demux.nf workflow that combines the earlier individual steps of genetic demultiplexing into a unified workflow that can run from start to finish with only specifying the input run to be processed.

In doing this, I deleted most of the test run setup from earlier workflows, created named workflows for each step with take and emit statements, and centralized parameters for specifying containers, files, and directories. So while there are a lot of changes, many of them are moving things around a bit and deleting mock data.

The biggest change/additions are:

  1. Adding logic to identify and build a channel of the bulk samples that are needed for reference genotyping.
  2. the logic for grouping samples by multiplex library for mpileup
  3. updating the workflow for cellsnp/vireo, which now includes more logic to pair the vcf files generated by mpileup with the starsolo runs.

I tried to leave comments to explain what is going on in each of those steps, but I might need to add more. nextflow can be a bit obtuse.

When running this, I found that some tweaks to the overall config were needed, so there are a few changes there as well.

Note that there is one process here that is not used: I was doing some testing with calling initial SNPs from the bulk data with cellsnp as well (what they call mode 2b). This is much faster than mpileup, but seemed in initial testing to give worse results, I may do a bit more testing with it, so I left the process there, even though no workflow currently uses it.

jashapiro commented 2 years ago

Merging as it is working now & I incorporated all feedback. Will probably need future tweaks!