QuKunLab / SpatialBenchmarking

BSD 2-Clause "Simplified" License
107 stars 26 forks source link

Incomplete file BLAST_CelltypeDeconvolution.ipynb #2

Closed canergen closed 2 years ago

canergen commented 2 years ago

Hi, Can you provide the notebook to compute the results for the 32 simulation datasets? E.g. PCC = pd.read_csv('FigureData/Figure4/32Simulation/PCC.csv', sep = ',', header = 0, index_col = 0) For dataset4 raw data seems to be missing in Google drive. Additionally for the 32 simulated datasets, I am not understanding the difference between the _r folder and other folder. N_vars and obs['cell_counts'] is different for both files.

SimualtedData/DataUpload/dataset10/dataset10_spatial_ds10.h5ad AnnData object with n_obs × n_vars = 1000 × 29484 obs: 'cell_counts' uns: 'density' SimualtedData/DataUpload/dataset10_r/dataset10_r_spatial_ds10.h5ad AnnData object with n_obs × n_vars = 1000 × 17926 obs: 'cell_counts' uns: 'density'

wenruyustc commented 2 years ago

Hi,I am Sorry that I may forgot to describe this part. For each pair of scRNA-seq data (including two scRNA-seq data from the same tissue: scRNA1, scRNA2), we use scRNA1 to simulate spatial data, which is called data, and scRNA2 to simulate spatial data, which is called for data_r. Compare_results.ipynb.zip

I have provided the notebook to compute the results for the 32 simulation datasets or you can used the notebook in Fig2. The method we computed the metrics is the same.

canergen commented 2 years ago

Thanks for your quick response. After downloading from google drive, I get folders like dataset12 with dataset12_scrna_celltype_final.txt scRNA.h5ad Spatial.h5ad and dataset12_r with dataset12_r_scrna_celltype_final.txt scRNA.h5ad Spatial.h5ad, so in each subfolder there is a dataset named Spatial.h5ad and one dataset scRNA.h5ad. It's not clear to me how this fits together with your explanation.

canergen commented 2 years ago

Thanks for your quick response. After downloading from google drive, I get folders like dataset12 with dataset12_scrna_celltype_final.txt scRNA.h5ad Spatial.h5ad and dataset12_r with dataset12_r_scrna_celltype_final.txt scRNA.h5ad Spatial.h5ad, so in each subfolder there is a dataset named Spatial.h5ad and one dataset scRNA.h5ad. It's not clear to me how this fits together with your explanation. For recomputing the results, I'm searching for a file similar to BLAST_CelltypeDeconvolution.ipynb (https://github.com/QuKunLab/SpatialBenchmarking/blob/main/BLAST_CelltypeDeconvolution.ipynb) but starting with the raw datasets and not with the already computed csv files.

wenruyustc commented 2 years ago

OK. For cell type deconvolution, we constructed 32 simulation datasets with known cell type distributions to evaluate the performance of these algorithms. Therefore, there are a total of 32 simulation datasets in Goole Drive (i.e. dataset1-16 and dataset1_r-16_r). Please ignore the difference between datasetX and datasetX_r, this is just a way of naming. To avoid misunderstanding, we will rename these data file directories later, just like Data1-32. When predicting cell type distribution, we need to input scRNA-seq data and spatial transcriptome data information. Therefore, in the dataset12 directory you downloaded, Spatial.h5ad is our simulated spatial transcriptome data, uns['density'] is the cell type composition of each spot, and we use this info as the gold standard. scRNA.h5ad is a single-cell dataset that requires input. After downloading 32 simulation datasets, you can use different tools for analysis, and use Compare_results.ipynb (We will put it into BLAST_CelltypeDeconvolution.ipynb later) to calculate the PCC, SSIM, JS, and RMSE files. Also, if you just need the prediction result for 32 simulation data, we will upload later and you can use this prediction result to calculate the PCC, SSIM, JS, and RMSE by Compare_results.ipynb. Thanks for your suggestion!