almaan / stereoscope

Spatial mapping of cell types by integration of transcriptomics data
MIT License
87 stars 26 forks source link

mouse-st-data.zip data need password #9

Closed BioAmelie closed 3 years ago

BioAmelie commented 4 years ago

Dear author,

I want to run stereoscope on my ST data. When I unzipped your deposited test data "mouse-st-data.zip", It reminded me it need ed password to unzip it. Can you reply to me with the password for "mouse-st-data.zip"?

Your sincerely,

almaan commented 4 years ago

Hi there,

if you just want to run stereoscope on your own data, you actually do not need the test data - it's only there as an example. Since the manuscript is still under review (a process which has been somewhat prolonged due to the COVID-19 situation), I can't share the password quite yet, sorry for that.

However, if you want to play around with some additional spatial data I can recommend you to go to spatialresearch.org where several public datasets are available to use.

Good luck, and let me know if there are any questions! Best Alma

BioAmelie commented 4 years ago

okay, actually I just want to see ST input data detailed format. Is ST count input data similar to single-cell count data?

almaan commented 4 years ago

I see! it is indeed similar to the single-cell count data. I will copy an excerpt from section 2.3 in the README, and then elaborate some regarding this:

The two alternative input options are, as of now: (1) a .tsv file where you have spots (the capture locations) arranged as rows and genes as columns. If your data is transposed (i.e. genes as rows and spots as columns) you actually don't have to reformat this, but include make sure to include the -stt flag to stereoscope run in order for the software to transpose your data after being read. The second option (2) is a .h5ad file formatted according to the conventions of the anndata package. Also, make sure your single cell and spatial data share the same gene identifiers, if your sc data uses HGNC symbols while the st data uses ENSEMBL id's you will encounter an error.

I hope this answered your questions!

BioAmelie commented 4 years ago

Thanks for your detailed explanation. I have successfully run stereoscopic on my data. Before visualization my data (using stereoscope look function), I found rows are spot id and columns are cell type in my W.2020-07-07232119.245643.tsv file. But in your example data, rows represent numbers, like this 4.83x31.08. Does this floating number represent every spot embedded in UMAP (for example, UMAP_1 * UMAP_2)? If not, can you tell me how to reform W.2020-07-07232119.245643.tsv to run "stereoscope look"?

almaan commented 4 years ago

Hi again! Glad to hear that everything went smooth for you running stereoscope. The numbers you refer to are the actual physical coordinates of each spot/capture location formatted like [x-coordinate]x[y-coordinate]. If your are working with Visium data you will find a file in the spatial folder (of the spaceranger output) called tissue_positions_list.csv. This .csvfile allows you to convert the spatial barcode to physical coordinates.

I can also refer to : LINK for a discussion regarding this.

Hopefully that clarified some stuff, let me know if any issues arise.

BioAmelie commented 4 years ago

Hi @almaan!

I have changed the row indices by [x_coordinate]x[y_coordinate] (from the thirty and fourth columns in the tissue_positions_list.csv), but I am confused that in my tissue_positions_list.csv, the x,y coordinate and pixel coordinate are all integer rather than floating. Did you use pixel divided by 100, so you got floating? You recommend using the pixel coordinates, which is the fourth and sixth columns in tissue_positions_list.csv. In a word, If I want to run stereoscope look, I just need to replace row indices by pixel coordinates (the fourth and sixth columns in tissue_positions_list.csv), do I should transfer the integer pixel coordinate into floating?

almaan commented 4 years ago

Hello! The data shown in the README examples are not Visium data, it's the older ST arrays, hence a different format. You should be fine just using the pixel coordinates!

Best Alma

BioAmelie commented 4 years ago

Hi @almaan! I have successfully run it! Thanks for your help! Now I want to fine-tuning parameters. I am confused about sc batch size and st batch size. It is not consistent between my single-cell data and ST data, so should I set value (for example 100) or use default value for those two parameters? What's more, the whole run time is a litter bit long, does the stereoscopic can be run in multiple threads?

almaan commented 4 years ago

Hi,

the batch size is a parameter that you can specify if your memory is not large enough to hold the complete data set or want to introduce some stochasticity into the process. I personally tend to use a batch size of 2048 but you can chose whatever parameter that you think is most compatible with your system. stereoscope cannot be run on mutliple threads atm, however if you are not using a GPU already, I strongly recommend this - I speeds up the process immensely.

Hope that gave some answers!

Best Alma

bioliyezhang commented 4 years ago

Hi @almaan, Do you have the consideration to output digital cell number rather than cell type proportion?

almaan commented 4 years ago

Hi @bioliyezhang

I'm not quite sure what you mean with "digital cell number", but will try to do my best in answering your question, let me know if I misunderstood you. The model is constructed to estimate the scaled quantity coefficients (products of a spot-specific scaling factor and the actual cell number, also these two values are latent variables). When the scaled quantity coefficients are normalized to sum to unity that's the proportion values you obtain. This means that the actual cell numbers are never used in the method. Furthermore, since the number of cells at each capture location varies and is not precisely known the number of cells from each cell type at respective location cannot easily be calculated. For more information of the method, I would refer you to the Method's section of the pre-print.

However, if you are interested in cell numbers rather than proportions - I would recommend you to perhaps try out some cell segmentation algorithms on the HE-image, assign each cell to a specific location and then multiply the total number of cells with the estimated proportion values.

Best Alma

BioAmelie commented 4 years ago

Hi @almaan,

You got my question. My question is why do not you output cell type number in a spot, for example, in a spot, it has one Tcell, two tumor cell, and three B cell, et al.

almaan commented 4 years ago

Great - I think I will refer to my previous answer then:

[...] The model is constructed to estimate the scaled quantity coefficients (products of a spot-specific scaling factor and the actual cell number, also these two values are latent variables). When the scaled quantity coefficients are normalized to sum to unity that's the proportion values you obtain. This means that the actual cell numbers are never used in the method. Furthermore, since the number of cells at each capture location varies and is not precisely known the number of cells from each cell type at respective location cannot easily be calculated. For more information of the method, I would refer you to the Method's section of the pre-print.

However, if you are interested in cell numbers rather than proportions - I would recommend you to perhaps try out some cell segmentation algorithms on the HE-image, assign each cell to a specific location and then multiply the total number of cells with the estimated proportion values.

To summarize, the number of cells from each type at a spot is not known, since the total number of cells at our spots is not known either. The probabilistic model that represents the cogs and wheels of stereoscope never works with the actual cell numbers but rather proportion values.

Hope that clarified things Alma

Jason-Qiuhai commented 3 years ago

Hi there,

if you just want to run stereoscope on your own data, you actually do not need the test data - it's only there as an example. Since the manuscript is still under review (a process which has been somewhat prolonged due to the COVID-19 situation), I can't share the password quite yet, sorry for that.

However, if you want to play around with some additional spatial data I can recommend you to go to spatialresearch.org where several public datasets are available to use.

Good luck, and let me know if there are any questions! Best Alma

Dear author,

Now the paper has published. Could you share the password?

Best

almaan commented 3 years ago

Hello @Jason-Qiuhai,

thanks for the reminder, the password can now be found in the README, but to save you some time I'll share it here as well.

password: zNLXkYk3Q9znUseS

Best /A