Inconsistent naming convention for Brain positions in the raw cluster results

boyiguo1 commented 2 years ago

In the clustering results saved at /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/clustering_results, e.g. /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/clustering_results/bayesSpace_harmony_9/clusters.csv

The naming convention of the column key is slightly inconsistent. e.g. AAACAAGTATCTCCCA-1 Br2720_ant_2 (the trailing "_2" after sample) vs AAACAACGAATAGTTC-1_Br2743_ant.

This potentially causes problems when using patterns to parse the key variable to Br_num and pos when merging with other datasets.

boyiguo1 commented 2 years ago

It is corrected in load("/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/01_build_spe/spe_filtered_final.Rdata"). Hence recommended to use the R data to retrieve the clustering results, as spe$bayesSpace_harmony_9

lcolladotor commented 2 years ago

The _2 in Br2720_ant_2 does represent a different sample than Br2743_ant. See https://github.com/LieberInstitute/spatialDLPFC/blob/main/raw-data/sample_info/Visium_DLPFC_all4rounds_manually_merged.xlsx and related sample information files for more details.

Also, why are you using /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/01_build_spe/spe_filtered_final.Rdata instead of /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/01_build_spe/spe_filtered_final_with_clusters.Rdata? That second file was created by https://github.com/LieberInstitute/spatialDLPFC/blob/adf3327d8a6cae711d80da62d7f79c573a56f32e/code/analysis/01_build_spe/01_build_spe.R#L619-L626.

Are you saying that there's a bug in spatialLIBD::cluster_import()? See how https://github.com/LieberInstitute/spatialLIBD/blob/master/R/cluster_import.R calls add_key(overwrite = TRUE) by default at https://github.com/LieberInstitute/spatialLIBD/blob/e2f179bb7a3bfef30b319e84dc7702e3ce99aa67/R/cluster_import.R#L47.

lcolladotor commented 2 years ago

All looks ok here to me on the file you linked to:

> x <- read.csv("/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/clustering_results/bayesSpace_harmony_9/clusters.csv")
>
> uniq <- unique(gsub(".*_Br", "Br", x$key))
> length(uniq)
[1] 30
> sort(uniq)
 [1] "Br2720_ant_2" "Br2720_mid"   "Br2720_post"  "Br2743_ant"   "Br2743_mid"
 [6] "Br2743_post"  "Br3942_ant"   "Br3942_mid"   "Br3942_post"  "Br6423_ant"
[11] "Br6423_mid"   "Br6423_post"  "Br6432_ant_2" "Br6432_mid"   "Br6432_post"
[16] "Br6471_ant"   "Br6471_mid"   "Br6471_post"  "Br6522_ant"   "Br6522_mid"
[21] "Br6522_post"  "Br8325_ant"   "Br8325_mid_2" "Br8325_post"  "Br8492_ant"
[26] "Br8492_mid"   "Br8492_post"  "Br8667_ant"   "Br8667_mid"   "Br8667_post"

boyiguo1 commented 2 years ago

My issue at the time was that I thought the Br part is the sample_id, and I tried to match it to deconv results files that Nick created. That would create a problem since Br2720_ant_2 would not match toBr2720_ant(at least this is my understanding). The same applies to Br6432_ant_2 and r8325_mid_2.

Also, why are you using /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/01_build_spe/spe_filtered_final.Rdata instead of /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rdata/spe/01_build_spe/spe_filtered_final_with_clusters.Rdata?

I think I copied the wrong line.

Are you saying that there's a bug in spatialLIBD::cluster_import()? See how https://github.com/LieberInstitute/spatialLIBD/blob/master/R/cluster_import.R calls add_key(overwrite = TRUE) by default at https://github.com/LieberInstitute/spatialLIBD/blob/e2f179bb7a3bfef30b319e84dc7702e3ce99aa67/R/cluster_import.R#L47.

I was not saying there was a bug.

LieberInstitute / spatialDLPFC

Inconsistent naming convention for Brain positions in the raw cluster results #133