sc Workbench primary analysis error for non-tsne/umap coordinates

IGS / gEAR

The gEAR Portal was created as a data archive and viewer for gene expression data including microarrays, bulk RNA-Seq, single-cell RNA-Seq and more.

https://umgear.org

GNU Affero General Public License v3.0

10 stars 5 forks source link

sc Workbench primary analysis error for non-tsne/umap coordinates #663

Open JPReceveur opened 2 months ago

JPReceveur commented 2 months ago

There is a profile where the user has used a different coordinate system in their paper and gEAR dataset (SWNE instead of tsne /UMAP). In the single cell workbench, it has a primary analysis available but using it results in an error. I'm guessing its because neither tsne or umap coordinates are being found. Shaun could you take a look to see if the marker gene step can work with SWNE coordinates?

Dataset ID: 2b71927b-8dc2-66ff-bc2a-103532196c07 Profile: https://umgear.org/index.html?multigene_plots=0&layout_id=human-utricle-sc-atlas&gene_symbol_exact_match=1&gene_symbol=sox2

All the datasets in the profile should be public

adkinsrs commented 2 months ago

Error I am seeing from the gEAR logs is that it is actually missing the "louvain" clustering column. Looking at the dataset in the Python REPL, I see several columns (seurat_clusters, cluster_label, cell_type) that you could just copy to make the louvain adata.obs column

JPReceveur commented 2 months ago

I though the primary analysis worked off of more than just 'louvain'. Do you need 'louvain' for it to work correctly? I was under the assumption primary analysis could work off of other columns as well. e.g. a primary analysis can get assigned based off cell_type in the make primary analysis script https://github.com/IGS/gEAR/blob/main/bin/add_primary_analyses_to_datasets.py

adkinsrs commented 2 months ago

After clicking "Primary Analysis", the get_stored_analysis.cgi script says that louvain was not calculated ("false), so this lines up with the add_primary_analysis_to_datasets.py script. The h5ad_find_marker_genes.cgi script does require the "louvain" column to be present

You are right that the "cell_type" cluster should have been detected, so I'm not sure why the h5ad and the pipeline.json files did not get updated to have the louvain analysis. Would it make sense just to rerun it?