Closed allyhawkins closed 3 years ago
I just did a quick test with exporting and importing loom data.
Unfortunately, there were a number of downsides.
None of these are insurmountable (aside from export time, which is pretty bad!), but they do make me less excited to use the format.
Another option is the h5ad format that seems a bit more native to scanpy, but I fear it will have many of the same issues. From the docs for a package that writes this format (https://theislab.github.io/zellkonverter/articles/zellkonverter.html) it is unclear if altExperiments are part of the export or not.
Okay what if we have an FAQ that's like How do I use the RDS files with Python?
that includes several lines of code that may include using R to split out CITE-seq and saving loom
objects. To me, this seems like a good "soft launch" goal where we might consider offering multiple formats in the future.
10 minutes!! That doesn't sound like an ideal situation... Based on this, I also would say that maybe loom files aren't as appealing.
It looks like based on the link you sent, that we would be able to just write out the sce to a h5ad without changing anything about the contents of the sce, including an altExperiment. I did a quick check with writing one of my sces out to an h5ad file and it also took ~ 10 minutes for one sce rather than a matter of seconds.
I think if the only concern about rds files is compatibility with python then I agree with Jackie about providing some code on how to do it yourself (and then providing multiple formats in the future). Although, I would argue that the h5ad format maybe more straight forward than loom since it looks like you wouldn't have to separate out CITE-seq based on my brief interaction with it.
@allyhawkins Did you read the file back in and check the contents? With loom, the failure mode was silent (my favorite kind of failure): there was no warning or error, but the AltExp was simply not there when reading the file.
I also ran into trouble getting the h5ad support with zellkonverter
to work: it uses conda in an... interesting... way and I suspect there was some incompatibility with my base conda install. Didn't want to spend time debugging the install. The installation happens on the first call to write an h5ad file, which might explain part of the time it took? If you repeat the export, is it faster?
But mostly, I am totally fine with RDS files + explanation of conversion in the docs. I'm actually not sure how seurat
handles AltExp, so we may want docs on that front as well.
Filed a docs ticket: https://github.com/AlexsLemonade/scpca-docs/issues/13
Did you read the file back in and check the contents? With loom, the failure mode was silent (my favorite kind of failure): there was no warning or error, but the AltExp was simply not there when reading the file.
When I read it back in the contents were relatively similar, but the assay name was changed from counts
to X
. I didn't have a sce loaded with CITE-seq that I tested with, but there is an empty slot for altExpNames
when I load it back in and it looks like a typical sce.
I also had trouble at first with the conda issues, but then just updated conda and it seemed to solve the issues? But regardless, I would agree that it seems .rds
right now is the way to go.
When I read it back in the contents were relatively similar, but the assay name was changed from counts to X. I didn't have a sce loaded with CITE-seq that I tested with, but there is an empty slot for altExpNames
Any SCE should have that, so I don't think that really answers the question, unfortunately. We'll need to test more directly.
Discussed in Slack but to reiterate: Exploring the use of loom files for output has been accomplished. We want to stick with RDS for now and add an FAQ to help python users (https://github.com/AlexsLemonade/scpca-docs/issues/13). We have some questions about Seurat to explore that are now tracked separately (#129).
So I'm going to close this and label it with future thoughts
to reflect that there are some ideas about testing h5ad in here that could be handy later.
There has been some discussion about the potential of using loom files instead of an RDS file with a single cell experiment object. See #107.
We should do some testing of generating a loom file to see if this is something we would want to use or continue to use an RDS file.
Including some more links for information on loom files here: http://loompy.org/ https://bioconductor.org/packages/release/bioc/vignettes/LoomExperiment/inst/doc/LoomExperiment.html