drieslab / Giotto

Spatial omics analysis toolbox
https://drieslab.github.io/Giotto_website/
Other
257 stars 98 forks source link

Xenium tutorials xenium_breast_cancer do not produce the same output #727

Closed terrytas closed 1 year ago

terrytas commented 1 year ago

Hi

I have been doing your Xenium tutorials, using xenium dataset available at 10x site below: https://giottosuite.readthedocs.io/en/latest/subsections/datasets/xenium_breast_cancer.html

dataset: https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast 

The tutorial code with the above dataset does not work as expected.

I have 2 problems. This issue is similar to the issue https://github.com/drieslab/Giotto/issues/701. In my case, without changing the code a bit, I cannot proceed further up to 6.3.

Problem 1. tutorials 5.1.1 the output of the tutorial is different from my output.

Tutorials 5.1.1 the output

 Transcripts info available:
   "transcript_id" "cell_id" "overlaps_nucleus" "feat_ID" "x" "y" "z_location" "qv"
  with 43664530 unfiltered detections
  and 34813341 filtered detections

 Blank Codeword detections: 8805
 Gene Expression detections: 34764833
 Negative Control Codeword detections: 1855
 Negative Control Probe detections: 37848

My output using the tutorial code and the dataset above.

  "transcript_id" "cell_id" "overlaps_nucleus" "feat_ID" "x" "y" "z_location" "qv"
 with 42638083 unfiltered detections
and 34493510 filtered detections

Blank Codeword detections:  10166
Gene Expression detections:  0
Negative Control Codeword detections:  2215
Negative Control Probe detections:  38413

As shown, no Gene Expression is detected. As a result the subsequent processing produced nothing.

However, I changed the code like below, then Gene Expression is detected.

# 4 Xenium feature types exploration
feat_types_IDs = lapply(
  #feat_types, function(type) feature_dt[feat_type == type, unique(feat_ID)]   ## changed
  feat_types, function(type) feature_dt[feat_type == type, unique(feat_name)]
)

Blank Codeword detections:  10166
Gene Expression detections:  34442716
Negative Control Codeword detections:  2215
Negative Control Probe detections:  38413

I wonder why the original code "feat_types, function(type) feature_dt[feat_type == type, unique(feat_ID)] " does not work with the dataset above. Perhaps, the dataset may have changed a little.

Problem 2. After changing the above code the subsequent processing works fine up to 6.3 Feature metadata. An error occured at addFeatMetadata.

> # 6.3 Feature metadata
> panel_meta = data.table::fread(panel_meta_path)
> data.table::setnames(panel_meta, 'Name', 'feat_ID')
>
> \# Append this metadata
> xenium_gobj = addFeatMetadata(gobject = xenium_gobj,
                               feat_type = 'rna',
                               spat_unit = 'cell',
                               new_metadata = panel_meta,
                               by_column = TRUE,
                               column_feat_ID = 'feat_ID')
Error in addFeatMetadata(gobject = xenium_gobj, feat_type = "rna", spat_unit = "cell",  :
  No matching expression information discovered for:
 spat_unit: cell
 feature type: rna
 Please add expression information first

After creating the object by createGiottoObjectSubcellular, the expression slot in xenium_gobj is empty.

> xenium_gobj@expression
NULL
# 5.1.3 Create Giotto Object
xenium_gobj = createGiottoObjectSubcellular(
  gpoints = list(rna = gpoints_list$`Gene Expression`,
                 blank_code = gpoints_list$`Blank Codeword`,
                 neg_code = gpoints_list$`Negative Control Codeword`,
                 neg_probe = gpoints_list$`Negative Control Probe`),
  gpolygons = list(cell = gpoly_cells,
                   nucleus = gpoly_nucs),
  instructions = instrs
)

Since the expression slot in xenium_gobj is empty addFeatMetadata produces an error, I guess. I would be grateful if you could tell me what to do to avoid the error.

Best regards,

Terry

Error Message

Error in addFeatMetadata(gobject = xenium_gobj, feat_type = "rna", spat_unit = "cell",  :
  No matching expression information discovered for:
 spat_unit: cell
 feature type: rna
 Please add expression information first
``` > # Append this metadata xenium_gobj = addFeatMetadata(gobject = xenium_gobj, feat_type = 'rna', spat_unit = 'cell', new_metadata = panel_meta, by_column = TRUE, column_feat_ID = 'feat_ID') Error in addFeatMetadata(gobject = xenium_gobj, feat_type = "rna", spat_unit = "cell", : No matching expression information discovered for: spat_unit: cell feature type: rna Please add expression information first ``` **System Information** > R.version.string [1] "R version 4.3.1 (2023-06-16)" > packageVersion("Giotto") [1] ‘3.3.0’ Python 3.11.4 > checkGiottoEnvironment() giotto environment found at /gpfsx01/home/lvf00035/.local/share/r-miniconda/envs/giotto_env/bin/python [1] TRUE Please replace the following according to your machine: - OS: [e.g. OSX] NAME="Red Hat Enterprise Linux ComputeNode" VERSION="7.3 (Maipo)" - Version [e.g. 4.0.1] - R - Giotto - Python **Additional context** Add any other context about the problem here.
mattobny commented 1 year ago

@terrytas Hello!

EDITED The data from 10X has changed since its initial release. I just re-downloaded it and opened ~/outs/cell_feature_matrix/features.tsv.gz. Originally, the columns were feature IDs, feature names, followed by the type (in that order). Now, the columns correspond to ensembl IDs, feature names, and feature type (in that order).

Therefore, your approach is correct with this new data:

# 4 Xenium feature types exploration
feat_types_IDs = lapply(
  feat_types, function(type) feature_dt[feat_type == type, unique(feat_name)]
)

We apologize as we were previously unaware of this formatting change and thank you for bringing this to our attention. We will work to update the website accordingly as soon as we can to reflect these changes.

The second half of your issue is regarding the requisite expression matrix for adding feature metadata to the Giotto Object. In order to run addFeatMetadata, an expression matrix must first exist for the respective spatial unit feature type pair. To generate this aggregate expression matrix, see section 7.1 and 7.2 as the site currently stands. In case this changes before you view this comment, the commands you must run are below:

xenium_gobj = calculateOverlapRaster(xenium_gobj,
                                     spatial_info = 'cell',
                                     feat_info = 'rna')

xenium_gobj = overlapToMatrix(xenium_gobj,
                              poly_info = 'cell',
                              feat_info = 'rna',
                              name = 'raw')

calculateOverlapRaster() rasterizes the polygon information and finds the overlapping transcripts. overlapToMatrix() takes this overlap data and assigns it into the expression matrix in the proper format (features by cells).

This will create an expression matrix named raw for the spatial unit cell and feature type rna. addFeatMetadata will run smoothly after these two functions are run.

terrytas commented 1 year ago

Hi Joselyn,

Thanks for your kind reply.

I have been informed that the original dataset has slightly changed.

Terry

From: Joselyn Chávez @.> Sent: Wednesday, August 9, 2023 4:17 AM To: drieslab/Giotto @.> Cc: TASHIRO,Toshiyuki @.>; Mention @.> Subject: Re: [drieslab/Giotto] Xenium tutorials xenium_breast_cancer do not produce the same output (Issue #727)

Hi @terrytashttps://github.com/terrytas, regarding the first question. The problem is when creating the 'feat_types_IDs'. Currently the tutorial code is using the feat_ID column that contains Ensembl ids instead of gene names. That's why then the apply function finds zero 'Gene Expression detections' . By replacing feat_ID with the feat_name we get again the good result.

Please replace: feat_types_IDs = lapply( feat_types, function(type) feature_dt[feat_type == type, unique(feat_ID)] )

With:

feat_types_IDs = lapply( feat_types, function(type) feature_dt[feat_type == type, unique(feat_name)] # replaced feat_ID with feat_name )

I'll update this code in the tutorial.

— Reply to this email directly, view it on GitHubhttps://github.com/drieslab/Giotto/issues/727#issuecomment-1670157030, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNNTNGJ424YCYIIY6VWKXDXUKGCDANCNFSM6AAAAAA27I76JQ. You are receiving this because you were mentioned.Message ID: @.**@.>>

terrytas commented 1 year ago

Hi mattobny,

Thanks for your kind reply.

Regarding the first issue, it is no longer problem.

With regard to the addFeatMetadata error, running calculateOverlapRaster and overlapToMatrix fixes the error.

regards

Terry

From: mattobny @.> Sent: Wednesday, August 9, 2023 1:34 AM To: drieslab/Giotto @.> Cc: TASHIRO,Toshiyuki @.>; Mention @.> Subject: Re: [drieslab/Giotto] Xenium tutorials xenium_breast_cancer do not produce the same output (Issue #727)

@terrytashttps://github.com/terrytas Hello! I do not believe the data from 10X has changed. I just re-downloaded it and opened various files. They seem to be in the same format(s).

Did you change any of the code in section 4 before running section 5.1? A change in column order for the feature_dt in section 4 could have caused you to need to change feat_ID to feat_name in section 5.1. In any case, I have been unable to reproduce this error. For this reason, I would advise you to simply use the convenience function, createGiottoXeniumObject().

The second half of your issue is regarding the requisite expression matrix for adding feature metadata to the Giotto Object. In order to run addFeatMetadata, an expression matrix must first exist for the respective spatial unit feature type pair. To generate this aggregate expression matrix, see section 7.1 and 7.2 as the site currently stands. In case this changes before you view this comment, the commands you must run are below:

xenium_gobj = calculateOverlapRaster(xenium_gobj,

                                 spatial_info = 'cell',

                                 feat_info = 'rna')

xenium_gobj = overlapToMatrix(xenium_gobj,

                          poly_info = 'cell',

                          feat_info = 'rna',

                          name = 'raw')

calculateOverlapRaster() rasterizes the polygon information and finds the overlapping transcripts. overlapToMatrix() takes this overlap data and assigns it into the expression matrix in the proper format (features by cells).

This will create an expression matrix named raw for the spatial unit cell and feature type rna. addFeatMetadata will run smoothly after these two functions are run.

— Reply to this email directly, view it on GitHubhttps://github.com/drieslab/Giotto/issues/727#issuecomment-1669950736, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANNNTNB2AD6QDN3HVW6CSRLXUJS6NANCNFSM6AAAAAA27I76JQ. You are receiving this because you were mentioned.Message ID: @.***>