AlexsLemonade / scpca-nf

scpca-nf is the Nextflow workflow for processing Single-cell Pediatric Cancer Atlas Portal data
BSD 3-Clause "New" or "Revised" License
13 stars 2 forks source link

Decide what output files to include for Spatial Transcriptomics #68

Closed allyhawkins closed 2 years ago

allyhawkins commented 2 years ago

Based on a discussion with @jashapiro in slack, before doing #63, we should decide if we would like to create rds files for the spatial libraries for users, or if we would prefer to create a tar.gz of the outs folder after running spaceranger and provide that to users. Part of the reasoning here is that we would only be importing the spaceranger output into R as a SpatialExperiment and then outputting that as an rds file, without adding any additional analyses. Would we want to add the extra step of creating the rds file if we are not going to do anything extra to the actual SpatialExperiment object and would people prefer to have the .cloupe file or .mtx.gz files to load into R themselves.

The pro for creating an .rds file is that the import scripts for the portal are already written to accept .rds files so we could keep everything in the same format and consistent across all samples. So I guess the question is how difficult would it be to change the importing for only a subset of samples that fall under the spatial category? Tagging @kurtwheeler for any thoughts he might have on that.

We also provide QC reports for the other libraries, however here we are running spaceranger which generates its own summary html file. Would it be sufficient to use this report or is there any reason to create our own report? (I think the only case for this would be if we were using Alevin-fry + Spaceranger together).

Based on the decisions made here, we may or may not need to complete #63.

allyhawkins commented 2 years ago

After our discussion today in the ST benchmarking meeting, we have decided to use Spaceranger and provide the outputs from Spaceranger as a zip file. We should include the unfiltered and filtered output from Spaceranger, the web summary (equivalent to our qc report), the spatial folder, and a metadata file that we would add in with version information. This is approximately what the contents of the download for each library would then look like for the ST libraries.

├── SCPCL000000_filtered_files
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── SCPCL000000_spaceranger_summary.html
├── SCPCL000000_spatial
│   ├── aligned_fiducials.jpg
│   ├── detected_tissue_image.jpg
│   ├── scalefactors_json.json
│   ├── tissue_hires_image.png
│   ├── tissue_lowres_image.png
│   └── tissue_positions_list.csv
├── SCPCL000000_unfiltered_files
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
└── SCPCL00000_metadata.json
jashapiro commented 2 years ago

I'm wondering if we might want to more directly mirror the Space Ranger outs directory, which might look more like this (basically package everything up within a folder for each library):

SCPCL000000
├── filtered_feature_bc_matrix
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── raw_feature_bc_matrix
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── spatial
│   ├── aligned_fiducials.jpg
│   ├── detected_tissue_image.jpg
│   ├── scalefactors_json.json
│   ├── tissue_hires_image.png
│   ├── tissue_lowres_image.png
│   └── tissue_positions_list.csv
├── SCPCL000000_metadata.json
└── SCPCL00000_spaceranger_summary.html
allyhawkins commented 2 years ago

This makes sense to me, I'm good with this organization.