almaan / stereoscope

Spatial mapping of cell types by integration of transcriptomics data
MIT License
87 stars 24 forks source link

Interaction with AnnData #18

Open romain-lopez opened 4 years ago

romain-lopez commented 4 years ago

I am trying to work outside of python command line and more in interaction with AnnData / scanpy. I am tempted to fork this package and patch it to make it work with AnnData objects directly somehow. Do you have some recommendations on how to do this? Is this something you thought about already?

giovp commented 3 years ago

I've been thinking about the same functionality, would be nice to run stereoscope in a notebook as part of a standard scanpy analysis. @almaan any pointer on how to start with building a python API for stereoscope?

romain-lopez commented 3 years ago

I did not get any answers, so I reimplemented stereoscope on another codebase. It takes now a couples of lines of code to run stereoscope from AnnData, and it works on jupyter notebook. I only had to reimplement the model in pytorch (< 100 lines of code), as there is a lot of duplicate code that overlaps with data loading, etc..

https://github.com/YosefLab/scvi-tools/blob/romain/spatial/Stereoscope.ipynb

The algorithm implementation is final, although we will incorporate this in the main codebase later. I need to add the right credits, references etc..

romain-lopez commented 3 years ago

Also, I found that the number of epochs per default can be a bit conservative. The model can run for many less, and therefore much quicker.

giovp commented 3 years ago

thanks @romain-lopez ! That looks great! Looking forward to try it out. noticed the same wrt to epochs number, can be dramatically shorter training.

romain-lopez commented 3 years ago

Thanks! You should be able to checkout this branch and install scvi-tools in editable mode; and then you can directly run the notebook (data is not on the branch, but it works on any AnnData as long as the gene sets are the same). Let me know if you run into problems, happy to help or take feedback.

almaan commented 3 years ago

Hi @romain-lopez and @giovp ,

apologies for a complete lack of responsiveness - albeit a bad one, my excuse is that I've been in multiple fairly intensive revision processes.

As mentioned earlier on to @giovp I've had plans to provide an API for scanpy, but continuously postponed this. It's great to see how you @romain-lopez have taken matters into your own hands and started this, if you are interested in making a PR or already have plans to do so, I'd gladly welcome it. My ambition is still to provide some form of API in the future, but cannot give a specific time for when this would occur.

In addition, as you (@romain-lopez ) seem to have already noticed, the model is extremely simple and should in its purest form not require more than a few lines of code. Maybe pointers are a bit late to give now @giovp , but I would start by having a look at the 'models.py' file where most of the essential features of the model can be found, potentially also glancing at the datasets.py module.

Regarding the epochs, it's more than true that the numbers set as default are almost unnecessarily high, they were set as to more or less guarantee near convergence when run - with the idea that the user then could either lower it if comfortable with using fewer epochs, or alternatively terminate the fitting process prematurely (Ctrl+C).

Also, thanks to both of you for the input and pushing this!