constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
248 stars 34 forks source link

Issue importing cellranger 7.0.0 outs #113

Closed ollieeknight closed 1 year ago

ollieeknight commented 2 years ago

I have a pipeline set to import 10x outs which has worked perfectly fine with Cellranger 6.1.2 and below, however when I run it through the /outs/ folder of a Cellranger 7.0.0-run-sample, I receive the following message:

Error in autoEstCont(.) Clustering information must be supplied, run setClusters first.

Could this be an issue with 10x shuffling around the 'analysis' folders of Cellranger 7.0.0 /outs/?

Best

Ollie

constantAmateur commented 2 years ago

Your guess sounds correct, load10X assumes the tree structure to the cellranger output folder that was current as of 6.x and below.

I'll add a fix for this in the next release. If you could share the tree structure of the cellranger 7.x analysis folder that would speed things up as I don't have anything mapped with 7.0.0 to hand.

ollieeknight commented 2 years ago

analysis/

├── clustering │   ├── antibody_capture_graphclust │   │   └── clusters.csv │   ├── antibody_capture_kmeans_10_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_2_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_3_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_4_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_5_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_6_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_7_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_8_clusters │   │   └── clusters.csv │   ├── antibody_capture_kmeans_9_clusters │   │   └── clusters.csv │   ├── gene_expression_graphclust │   │   └── clusters.csv │   ├── gene_expression_kmeans_10_clusters │   │   └── clusters.csv │   ├── gene_expression_kmeans_2_clusters │   │   └── clusters.csv │   ├── gene_expression_kmeans_3_clusters │   │   └── clusters.csv │   ├── gene_expression_kmeans_4_clusters │   │   └── clusters.csv │   ├── gene_expression_kmeans_5_clusters │   │   └── clusters.csv │   ├── gene_expression_kmeans_6_clusters │   │   └── clusters.csv │   ├── gene_expression_kmeans_7_clusters │   │   └── clusters.csv │   ├── gene_expression_kmeans_8_clusters │   │   └── clusters.csv │   └── gene_expression_kmeans_9_clusters │   └── clusters.csv ├── diffexp │   ├── antibody_capture_graphclust │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_10_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_2_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_3_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_4_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_5_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_6_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_7_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_8_clusters │   │   └── differential_expression.csv │   ├── antibody_capture_kmeans_9_clusters │   │   └── differential_expression.csv │   ├── gene_expression_graphclust │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_10_clusters │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_2_clusters │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_3_clusters │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_4_clusters │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_5_clusters │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_6_clusters │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_7_clusters │   │   └── differential_expression.csv │   ├── gene_expression_kmeans_8_clusters │   │   └── differential_expression.csv │   └── gene_expression_kmeans_9_clusters │   └── differential_expression.csv ├── pca │   ├── antibody_capture_10_components │   │   ├── components.csv │   │   ├── dispersion.csv │   │   ├── features_selected.csv │   │   ├── projection.csv │   │   └── variance.csv │   └── gene_expression_10_components │   ├── components.csv │   ├── dispersion.csv │   ├── features_selected.csv │   ├── projection.csv │   └── variance.csv ├── tsne │   ├── antibody_capture_2_components │   │   └── projection.csv │   └── gene_expression_2_components │   └── projection.csv └── umap ├── antibody_capture_2_components │   └── projection.csv └── gene_expression_2_components └── projection.csv

constantAmateur commented 2 years ago

Thanks! 👍

ollieeknight commented 2 years ago

any progress with this, by any chance?

pjlmac commented 1 year ago

Any fix, either in the package or in some other way of amending this issue?

NKalavros commented 1 year ago

You can rework the load10X function from SoupX to work with CR7.

The offending lines are the following:

 file.path(dataDir,'analysis','clustering', 'gex', 'graphclust','clusters.csv'),
file.path(dataDir,'analysis','clustering','graphclust','clusters.csv')
file.path(dataDir,'analysis','clustering', 'gex', 'kmeans_10_clusters','clusters.csv'),
file.path(dataDir,'analysis','clustering','kmeans_10_clusters','clusters.csv')
file.path(dataDir,'analysis','dimensionality_reduction','gex','tsne_projection.csv'),
file.path(dataDir,'analysis','tsne','2_components','projection.csv')

You need to update these paths to be current. They now look like this:

file.path(dataDir,'analysis','clustering', 'gene_expression_graphclust','clusters.csv'),
file.path(dataDir,'analysis','clustering','gene_expression_graphclust','clusters.csv')
file.path(dataDir,'analysis','clustering', 'gene_expression_kmeans_10_clusters','clusters.csv'),
file.path(dataDir,'analysis','clustering','gene_expression_kmeans_10_clusters','clusters.csv')
file.path(dataDir,'analysis','dimensionality_reduction','gex','tsne_projection.csv'),
file.path(dataDir,'analysis','tsne','gene_expression_2_components','projection.csv')
pjlmac commented 1 year ago

Thanks, NKalavros!

constantAmateur commented 1 year ago

Thanks for the help with this all. The devel version should fix the problem, but I'm not sure what the folder structure looks like for multiome runs (e.g. ATAC+RNA) mapped with cellranger V7. I'm reasoning that the addition of a "gene_expression" prefix means that the folder structure will be the same for multiome and single modality (previously it was different for multiome, which the current code checks for).

If someone has such data, I'd appreciate it if you could run the devel version and tell me if it works for this. It can be installed be running:

devtools::install_github("constantAmateur/SoupX",ref='devel')

jamesdahlvang commented 1 year ago

Just tried the devel version and it did not work. However, when I switched the code in load10X to include @NKalavros suggestion it ran beautifully.

constantAmateur commented 1 year ago

I've now tested this locally and 1.6.2 loads cellranger V7 data correctly. If anyone is still having issues please reopen this issue and provide an example of data that does not load.