AlexsLemonade / alsf-scpca

Management and analysis tools for ALSF Single-cell Pediatric Cancer Atlas data.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Explore use of loom files for output #122

Closed allyhawkins closed 3 years ago

allyhawkins commented 3 years ago

There has been some discussion about the potential of using loom files instead of an RDS file with a single cell experiment object. See #107.

We should do some testing of generating a loom file to see if this is something we would want to use or continue to use an RDS file.

Including some more links for information on loom files here: http://loompy.org/ https://bioconductor.org/packages/release/bioc/vignettes/LoomExperiment/inst/doc/LoomExperiment.html

jashapiro commented 3 years ago

I just did a quick test with exporting and importing loom data.

Unfortunately, there were a number of downsides.

None of these are insurmountable (aside from export time, which is pretty bad!), but they do make me less excited to use the format.

Another option is the h5ad format that seems a bit more native to scanpy, but I fear it will have many of the same issues. From the docs for a package that writes this format (https://theislab.github.io/zellkonverter/articles/zellkonverter.html) it is unclear if altExperiments are part of the export or not.

jaclyn-taroni commented 3 years ago

Okay what if we have an FAQ that's like How do I use the RDS files with Python? that includes several lines of code that may include using R to split out CITE-seq and saving loom objects. To me, this seems like a good "soft launch" goal where we might consider offering multiple formats in the future.

allyhawkins commented 3 years ago

10 minutes!! That doesn't sound like an ideal situation... Based on this, I also would say that maybe loom files aren't as appealing.

It looks like based on the link you sent, that we would be able to just write out the sce to a h5ad without changing anything about the contents of the sce, including an altExperiment. I did a quick check with writing one of my sces out to an h5ad file and it also took ~ 10 minutes for one sce rather than a matter of seconds.

I think if the only concern about rds files is compatibility with python then I agree with Jackie about providing some code on how to do it yourself (and then providing multiple formats in the future). Although, I would argue that the h5ad format maybe more straight forward than loom since it looks like you wouldn't have to separate out CITE-seq based on my brief interaction with it.

jashapiro commented 3 years ago

@allyhawkins Did you read the file back in and check the contents? With loom, the failure mode was silent (my favorite kind of failure): there was no warning or error, but the AltExp was simply not there when reading the file.

I also ran into trouble getting the h5ad support with zellkonverter to work: it uses conda in an... interesting... way and I suspect there was some incompatibility with my base conda install. Didn't want to spend time debugging the install. The installation happens on the first call to write an h5ad file, which might explain part of the time it took? If you repeat the export, is it faster?

But mostly, I am totally fine with RDS files + explanation of conversion in the docs. I'm actually not sure how seurat handles AltExp, so we may want docs on that front as well.

jaclyn-taroni commented 3 years ago

Filed a docs ticket: https://github.com/AlexsLemonade/scpca-docs/issues/13

allyhawkins commented 3 years ago

Did you read the file back in and check the contents? With loom, the failure mode was silent (my favorite kind of failure): there was no warning or error, but the AltExp was simply not there when reading the file.

When I read it back in the contents were relatively similar, but the assay name was changed from counts to X. I didn't have a sce loaded with CITE-seq that I tested with, but there is an empty slot for altExpNames when I load it back in and it looks like a typical sce.

I also had trouble at first with the conda issues, but then just updated conda and it seemed to solve the issues? But regardless, I would agree that it seems .rds right now is the way to go.

jashapiro commented 3 years ago

When I read it back in the contents were relatively similar, but the assay name was changed from counts to X. I didn't have a sce loaded with CITE-seq that I tested with, but there is an empty slot for altExpNames

Any SCE should have that, so I don't think that really answers the question, unfortunately. We'll need to test more directly.

jaclyn-taroni commented 3 years ago

Discussed in Slack but to reiterate: Exploring the use of loom files for output has been accomplished. We want to stick with RDS for now and add an FAQ to help python users (https://github.com/AlexsLemonade/scpca-docs/issues/13). We have some questions about Seurat to explore that are now tracked separately (#129).

So I'm going to close this and label it with future thoughts to reflect that there are some ideas about testing h5ad in here that could be handy later.