Closed PeteHaitch closed 5 years ago
Perhaps it can be smarter still. Say, you've loaded an object with loadHDF5SummarizedEzperiment()
, modified the coldData and now want to save the updated object. It seems desirable to only 'update' se.rds
and not have to re-save assays.h5
.
@PeteHaitch
Hi Pete,
I finally tackled this. My approach was to introduce a new function for this. See ?quickResaveHDF5SummarizedExperiment
in HDF5Array 1.11.8. Here is what it says:
quickResaveHDF5SummarizedExperiment
preserves the HDF5 file and datasets that the assays in x
are already pointing to (and which were created by an earlier call to saveHDF5SummarizedExperiment
). All it does is re-serialize x
on top of the .rds
file that is associated with this HDF5 file (and which was created by an earlier call to saveHDF5SummarizedExperiment
or quickResaveHDF5SummarizedExperiment
). Because the delayed operations possibly carried by the assays in x
are not realized, this is very fast.It covers many use cases e.g. it can be used after modifying the rowData, colData, or dimnames, or after adding a rowRanges component, or performing delayed operations on the assays. See the last example in the man page. Hopefully this covers the use cases you had in mind.
H.
@PeteHaitch I'm closing this. Feel free to reopen if you feel the use cases you had in mind are not covered by quickResaveHDF5SummarizedExperiment()
.
Question now that I look at it again. Why is this a new function and not
just something that works with saveXX. Perhaps with an argument
forceSave
to force it to overwrite?
On Wed, Jan 30, 2019 at 7:22 PM hpages notifications@github.com wrote:
@PeteHaitch https://github.com/PeteHaitch I'm closing this. Feel free to reopen if you feel the use cases you had in mind are not covered by quickResaveHDF5SummarizedExperiment().
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/HDF5Array/issues/10#issuecomment-459165087, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuhnxn7cmSrGhjCaF85UMihqt2YFyP3ks5vIjdFgaJpZM4V_F6e .
I initially considered that but after realizing that the 2 functions do a quite different job, share very little code, and none of all the extra arguments supported by saveHDF5SummarizedExperiment()
was making sense for quickResaveHDF5SummarizedExperiment()
, I found it cleaner to separate the 2 things.
Following up on our chat at BioC2018 (ping @kasperdanielhansen)
It'd be great if
saveHDF5SummarizedExperiment()
didn't re-write the HDF5 file when it's not necessary.My use case is when writing a function, call it
makeHDF5SE()
, that creates an HDF5-backed SummarizedExperiment. IfmakeHDF5SE()
satisfies some minimal contract (e.g., all assay data are written to a file calledassays.h5
) then callingsaveHDF5SummarizedExperiment()
as the last step ofmakeHDF5SE()
should just create these.rds
file in the same directory. Then, the savedse.rds
andassays.h5
should then be compatible withloadHDF5SummarizedExperiment()
when run from a new session.A concrete example where I'd like to use this is to replace https://github.com/hansenlab/bsseq/blob/65e163c3dacce307a88ba21401a91b7ed1aba876/R/read.bismark.R#L543-L549 by
saveHDF5SummarizedExperiment(bsseq, dir = dir, replace = replace)
(note how I've mimicked the other arguments ofsaveHDF5SummarizedExperiment()
further up in the function https://github.com/hansenlab/bsseq/blob/65e163c3dacce307a88ba21401a91b7ed1aba876/R/read.bismark.R#L362-L365 and that these are used elsewhere insideread.bismark()
).