Closed jashapiro closed 3 years ago
I had thought about it, but that step is so very fast (less than 5 seconds) that it hardly seems worth it. I have code in there so we are only doing it for the number of indexes that exist, so it isn't like there are a bunch of extra copies hanging around.
If we did publish it, we would presumably want to have some code to look for the published version, which might actually make things more complex!
This PR adds a workflow to perform CITE-seq mapping and quantification with
salmon alevin
/alevin-fry
. This workflow does not yet combine these data with the scRNAseq results; that will come in a separate PR, likely when incorporating these results intoscpca-nf
. For now, the goal is partly to have a set of output files that we can use to develop the steps required for creating SingleCellExperiment objects with both modalities of data.This workflow is based largely on the existing
alevin-fry
workflow, but adds some elements and other changes.Indexing is incorporated into this workflow, as the barcode index files may vary by sample. Indexing is very fast for these tiny files, but there are a couple of considerations to make it more efficient:
alevin quant
. Since the index directory is small, we just pass along the whole index directory in the mapping output, though we could pass just the file. It should be a pretty easy change.I combined all
alevin-fry
steps into a single process. This should reduce some of the copying to and from S3, and should speed the workflow in general. We may well want to incorporate a similar change into RNAseq quantification workflow.In preparation for when the CITE-seq and RNA quants are merged, I made this workflow pass both
run_id
andsample_id
along with the other required inputs and output. I think we will probably want to do something like this for thescpca-nf
workflow, so while we don't really use these here, this was a good place for a proof of concept.I think we will be able to just combine RNA and CITE-seq by
sample_id
(using something similar to the.combine
step here, but I will need to check that this works always. If it is not the case, we may need to add another column to the library info table which indicates whichrun_id
contains the corresponding RNA for each CITE-seq or cell hash sample.