brisvag / blik

Python tool for visualising and interacting with cryo-ET and subtomogram averaging data.
https://brisvag.github.io/blik/
GNU General Public License v3.0
23 stars 8 forks source link

[WIP] IO updates and lazy loading #84

Closed brisvag closed 3 years ago

brisvag commented 3 years ago

This PR adds some simple readers (.box and .cbox) to our repertoire, while simplifying a bit the io module (it was too fragmented).

As discussed in #78 and #79, we wanted some lazy loading functionality. Here, I implemented a version of it (for images only) that has a very simple interface: if the reader passes a callable to the ImageBlock constructor, the data will be lazily loaded when required.

Finally, I added some small metaprogramming code to improve maintainability: pt.peep and other wrapper functions now simply inherit everything of importance (signature and docstrings) from the wrapped function. Try ?pt.peep or ?pt.Peeper.classify_radial_profile :D

codecov[bot] commented 3 years ago

Codecov Report

Merging #84 (17ab4d0) into develop (7f167dc) will decrease coverage by 0.01%. The diff coverage is 88.79%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #84      +/-   ##
===========================================
- Coverage    82.16%   82.15%   -0.02%     
===========================================
  Files          103      103              
  Lines         2013     2163     +150     
===========================================
+ Hits          1654     1777     +123     
- Misses         359      386      +27     
Impacted Files Coverage Δ
peepingtom/datablocks/multiblocks/__init__.py 100.00% <ø> (ø)
...eepingtom/datablocks/multiblocks/transformblock.py 31.25% <ø> (ø)
peepingtom/datablocks/simpleblocks/__init__.py 100.00% <ø> (ø)
peepingtom/io_/utils/generic.py 100.00% <ø> (ø)
peepingtom/io_/writing/__init__.py 100.00% <ø> (ø)
peepingtom/io_/writing/em/__init__.py 100.00% <ø> (ø)
peepingtom/io_/writing/mrc/__init__.py 100.00% <ø> (ø)
peepingtom/io_/writing/star/__init__.py 100.00% <ø> (ø)
peepingtom/depictors/napari/particledepictor.py 72.97% <50.00%> (-12.75%) :arrow_down:
peepingtom/io_/reading/box.py 50.00% <50.00%> (ø)
... and 80 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8ce3590...17ab4d0. Read the comment docs.

brisvag commented 3 years ago

Some new changes, but the one that sticks out on the user interface: reading now works with globs instead of regexes!

This should result in a consistent experience from command line or ipython, once the command line entry point is fixed to work nicely with the new changes!

brisvag commented 3 years ago

A few changes were added in the last few commits, and they might be hard to track. Here's a summary.

DataBlocks

SpatialBlock

Base classes are now moved to abstractblocks, and a new one is available: SpatialBlock. This class is just a collection of methods and properties shared by a lot of datablocks that have a spatial component (so excluding stuff like PropertyBlock).

Metaclasses

Metablock

New MetaBlock metaclass for datablocks. Its role is to automatically generate (and make explicit) the signature of datablocks. In practice, this simply means that when you do:

?PointBlock

you get something like this:

pt.datablocks.PointBlock(
    self,
    *,
    name=None,
    volume=None,
    peeper=None,
    parent=None,
    data=None,
    lazy_loader=None,
    pixel_size=1,
    dims_order='xyz',
    ndim=3,
    **kwargs,
)

rather than

pt.datablocks.PointBlock(
    self,
    *args,
    **kwargs,
)

This also means that completion in ipython works for all the arguments. Also, datablocks args are now all keyword-only. Things were getting messy and hard to read/maintain in some cases. Now you always have to provide data=..., which is more tedious, but more explicit. Anyways, we don't want people to be instantiating datablocks manually.

MetaMultiBlock

Another new metaclass is MetaMultiBlock. This extends the previous metaclass for Metablocks; differently from the latter, however, this generates the actual signature of the multiblock, while simplifying the class declaration. for example, here's all that's needed to declare OrientedPointBlock (except from convenience methods/properties):

class OrientedPointBlock(SpatialBlock, MultiBlock):
    _block_types = {'positions': PointBlock, 'orientations': OrientationBlock}

This automatically generates this signature, by using the provided block_types dict:

pt.datablocks.OrientedPointBlock(
    self,
    *,
    name=None,
    volume=None,
    peeper=None,
    parent=None,
    pixel_size=1,
    dims_order='xyz',
    ndim=3,
    positions_data=None,
    positions_lazy_loader=None,
    orientations_data=None,
    orientations_lazy_loader=None,
    **kwargs,
)

Anything specific to each datablock is prefixed with that block's key in the dict (positions, orientations). Any inheritable arg (dimensionality, pixel size, etc) are instead set only for the multiblock. Contained blocks will read these values from the multiblock (the parent property of a block now always refers to the "top level block", rather than just "what this sliced block is view of". So OrientedPointBlock.positions[:3].parent is still OrientedPointBlock, and not OrientedPointBlock.positions). Also, the contained blocks are added as attributes to the multiblock, just like before (e.g.: OrientedPointBlock.positions).

Lazy Loading

This is now explicit. All SimpleBlocks accept a data XOR a lazy_loader argument, and will be lazy if the second is provided. Methods that load() and unload() datablocks are now avaiable, which makes it possible to implemet some smart loading/unloading when a larger-than-memory dataset is loaded. This may be better if implemented with good libraries like Dask, but that's for future work.

Globs

Globs for directories are explicit. It was too hard to keep track of exceptions and things were getting messy. So input is more streamlined now: path/* will work, but path/ won't.