ShobiStassen / VIA

trajectory inference
https://pyvia.readthedocs.io/en/latest/
MIT License
86 stars 21 forks source link

CSV to AnnData Object Conversion Details #74

Open thomas-jac opened 1 week ago

thomas-jac commented 1 week ago

Hello, I am trying to benchmark the Mouse Hypothalamic Preoptic Region MERFISH dataset (https://datadryad.org/stash/dataset/doi:10.5061/dryad.8t8s248) using different methods since it was used in the spatial data tutorial provided in the StaVia docs. However, there are certain differences in the data provided in the CSV file on the linked website and the AnnData object used in the tutorial including the number of cells present, how those cells were selected, and how the spatial matrix (part of obsm) was obtained from the Centroid_X and Centroid_Y data available in the CSV file. It would be very helpful and great if the code for this conversion could be made available or the process was elucidated. Thank you!

ShobiStassen commented 1 week ago

Hi @thomas-jac Thomas,

Thanks for trying StaVia. The h5ad file is based on the anndata object also available through the squidpy library (the readthedocs stavia tutorial has a line of commented code if you prefer to load this file directly from squidpy). It is a subset (one female sample, animal id 1)of the full data, but StaVia can easily also run on the full dataset. However, in order to handle the multi-sample, multi-slice approach, the tutorial on the Zesta dataet for StaVia might be useful as the initial steps will need to adjust for a multi slice/multi-sample dataset. The paper figures are however based on the subset provided in the squidpy library