Closed allyhawkins closed 2 years ago
Thanks for the helpful feedback @jashapiro. I went ahead and separated spaceranger and cellranger to be in two separate docker images, adding in a readme specific to the spaceranger image within that folder even though it is largely the same as cellranger. I then added the capability to process the visium_v1
and visium_v2
tags to run-cellranger.nf
for now. I will include the updates to process unfiltered spatial data to Alevin-fry in a separate PR, so did not add the technology options there yet.
In order to start to address #143, we first needed to be able to process the Spatial libraries. I chose to modify both the alevin-fry and cellranger workflows in this repo to be able to accommodate running the spatial libraries so that next steps would include directly comparing the output from Alevin-fry to using the cell ranger equivalent, space ranger before choosing which workflow we would want to implement in
scpca-nf
.There were a few points to consider in doing this:
I started by following the Alevin-fry tutorial on processing spatial data and noticed that the actual use of Alevin-fry is no different between a single-cell library or using a spatial library. The main difference is that we actually don't have a barcode file to use during the
generate-permit-list
step for spatial libraries so we are forced to use theknee-distance
filtering method in Alevin-fry. I even looked at the files that 10X provides when you search the slide serial numbers and the barcodes are not in the file, but only the X,Y coordinates for each spot with an extra code. I feel like there might be some other hidden spot on the 10X website where the barcodes live, but I was unable to find it so for now just tested this with running it using the knee filtering.In order to use
spaceranger
I needed to add it into the docker image. Rather than make a brand new image, I chose to add it into the existing Cell Ranger image. Let me know if this is not the preferred method and I should make a separate image instead.Right now we list all of the files for a sample in the
files
column ofscpca-library-metadata.tsv
, this includes grouping the image file with the fastq files for spatial libraries. I had to add a filtering step within thegetCRSamples
to only grab sample names from the fastq files, otherwise there will be an indexing error since not every file will have.fastq.gz
appended and the indexing will fail. I also am grabbing the image file by just searching for the.jpg
extension. An alternative to this would be to create a separate column in the metadata file for the image file from the start and then the filename can be taken from that column explicitly. We will need the image file for other steps downstream in processing with Alevin-fry as well so maybe this is worth doing?While I was working on the cellranger workflow I also chose to modify it to use the same format that we have been using in
scpca-nf
of passing the metadata through the process.