Open emdann opened 2 years ago
@emdann
@xinyuejohn started with the integration of milopy into pertpy here: https://github.com/theislab/pertpy/pull/165 He could also tackle my feedback that I posted here if it's fine with you? Then we don't need to duplicate the work and John and me can figure this out directly.
What do you think?
Hi @Zethson Apologies for the late react on this and thanks a lot for the feedback. I am a bit slumped with some urgent work at the moment so I am more than happy for @xinyuejohn to take over and implement these changes during integration!
Great!
A few general comments:
- You should consider adding all doc related files to the gitignore. They pollute pull requests. A CI job should build the docs and with for example RTD integration you can also preview the docs before merging. This does not matter for our pertpy plans, but wanted to write it nevertheless.
- A couple of tools of pertpy have a "load" function that prepares the input for the subsequent tool. I could see us having a load function that returns a MuData object (by creating or appending) that is ready for all downstream Milopy steps. ->
milopy = pt.tl.Milopy() mdata = milopy.load(adata) # adds compositional adata object and returns mdata object # alternative mdata = milopy.load(mdata_old, rna="rna") # where rna refers to the existing RNA modality in the AnnData object
- I didn't add this comment everywhere, but we should ensure that the naming is consistent with unwritten muon/MuData usage rules -> "rna" and "compositional" ("samples" might be too unspecified). Likely should even name it "milo_compositional", because tascCODA might add its own "compositional"
As for your comment 2, the current workflow works like this:
milo.make_nhoods(adata, ...)
to get neighbourhoods, which saved in adata itselfmilo_mdata = milo.count_nhoods(adata, sample_col="sample")
to get a MuData object. Therefore, if I need to add a mdata = milo.load()
function, it should substitute the original milo.count_nhoods
function. What do you think? Do you think it follows the best practice? @Zethson
Therefore, if I need to add a
mdata = milo.load()
function, it should substitute the originalmilo.count_nhoods
function. What do you think? Do you think it follows the best practice? @Zethson
What I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.
Does it make sense?
Therefore, if I need to add a
mdata = milo.load()
function, it should substitute the originalmilo.count_nhoods
function. What do you think? Do you think it follows the best practice? @ZethsonWhat I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.
Does it make sense?
Makes sense to me and it makes the whole pipeline more clear. In this way, the MuData object the load function generates only includes one 'rna' modality. And all the downstream functions take this mudata as the input.
The original cell-level AnnData object and the sample x nhoods AnnData are stored in a common MuData object with
.mod
cells
andsamples
. Samples are inmdata["samples"].obs
and nhoods are inmdata["samples"].var
.Notes
make_nhoods
function is left as is, the MuData object gets created bycount_nhoods
. The rationale is that here samples are not taken into consideration yet.axis
as defined in MuData documentation, since neither.obs
nor.var
are shared between cells and samplesTo do