CITE-seq workflo - Githubissues

This PR adds a workflow to perform CITE-seq mapping and quantification with salmon alevin/alevin-fry. This workflow does not yet combine these data with the scRNAseq results; that will come in a separate PR, likely when incorporating these results into scpca-nf. For now, the goal is partly to have a set of output files that we can use to develop the steps required for creating SingleCellExperiment objects with both modalities of data.

This workflow is based largely on the existing alevin-fry workflow, but adds some elements and other changes.

Indexing is incorporated into this workflow, as the barcode index files may vary by sample. Indexing is very fast for these tiny files, but there are a couple of considerations to make it more efficient:
- The workflow first creates a channel that finds all unique feature barcode files from the requested sets, and indexes each only once.
- The feature index channel is then joined to the feature reads channel, matched by that index name, this is then sent for mapping.
- The feature index also contains a dummy transcript 2 gene mapping file, which is needed by alevin quant. Since the index directory is small, we just pass along the whole index directory in the mapping output, though we could pass just the file. It should be a pretty easy change.
I combined all alevin-fry steps into a single process. This should reduce some of the copying to and from S3, and should speed the workflow in general. We may well want to incorporate a similar change into RNAseq quantification workflow.
In preparation for when the CITE-seq and RNA quants are merged, I made this workflow pass both run_id and sample_id along with the other required inputs and output. I think we will probably want to do something like this for the scpca-nf workflow, so while we don't really use these here, this was a good place for a proof of concept.

I think we will be able to just combine RNA and CITE-seq by sample_id (using something similar to the .combine step here, but I will need to check that this works always. If it is not the case, we may need to add another column to the library info table which indicates which run_id contains the corresponding RNA for each CITE-seq or cell hash sample.

AlexsLemonade / alsf-scpca

CITE-seq workflo #120