YangLabHKUST / SpatialScope

A unified approach for integrating spatial and single-cell transcriptomics data by leveraging deep generative models
https://spatialscope-tutorial.readthedocs.io/en/latest/
GNU General Public License v3.0
50 stars 6 forks source link

Issue with Accessing Raw Data Link for Human Heart scRNA-seq Atlas #12

Closed aqlkzf closed 7 months ago

aqlkzf commented 7 months ago

Hello,

I was going through the Human Heart tutorial and encountered an issue when trying to access the raw dataset for the human heart scRNA-seq atlas. The link provided in the section "We will use a human heart snRNA-seq atlas as reference dataset, the raw dataset in h5ad format (global_raw.h5ad) is available in here" appears to be broken or unavailable.

Could you please update the method for downloading the raw data or fix the existing link to ensure accessibility?

Thank you for your assistance.

Best regards, Jishuai

image image
aqlkzf commented 7 months ago

Hi,

Following up on my earlier issue with the Human Heart scRNA-seq atlas dataset, I downloaded the global_raw.h5ad file from Heart Cell Atlas as per Litviňuková et al., Nature 2020. However, using Scanpy's sc.pp.filter_cells(ad_sc, min_counts=500) resulted in an empty dataset (0, 32732), suggesting all cells were filtered out.

Also, the dataset lacked a ad_sc.obs['cell_source'] column. I improvised by merging facility and cell_or_nuclei columns:


ad_sc.obs['cell_source'] = str(ad_sc.obs['facility']) + '-' + str(ad_sc.obs['cell_or_nuclei'])
aqlkzf commented 7 months ago

I've solved the problem according to the description in the original paper. For those looking to access the raw dataset, the correct download link is: https://cellgeni.cog.sanger.ac.uk/heartcellatlas/v2/Global_raw.h5ad

Additionally, I noticed a small adjustment needed for the dataset to work smoothly with the preprocessing steps outlined in the Human Heart tutorial (docs/source/notebooks/Human-Heart.ipynb):

  1. Adding cell_source Column: To align with the tutorial's preprocessing steps, you'll need to add an obs column named cell_source. This can be done with the following line of code:
ad_sc.obs['cell_source'] = ad_sc.obs['facility'].astype(str) + '-' + ad_sc.obs['cell_or_nuclei'].astype(str)
  1. Correction in Plotting Function: In the preprocessing section (## Preprocessing scRNA-ref), the correct parameter for the last plotting function should be color="cell_state", not color="cell_states".

I hope these updates help others in their exploration of this fascinating tool.