From Vesuvius Challenge, a Python library for accessing CT scans of ancient scrolls.
vesuvius
allows direct access to scroll data without managing download scripts or storing terabytes of CT scans locally:
import vesuvius
import matplotlib.pyplot as plt
scroll = vesuvius.Volume("Scroll1")
img = scroll[1000,:,:]
plt.imshow(img)
Data is streamed in the background, only serving the requested regions.
The library provides tools for accessing, managing, and manipulating high-resolution volumetric data related to Vesuvius Challenge. It supports both remote and local data, with options for caching and normalization.
For a similar library in C, see vesuvius-c.
ā ļø
vesuvius
is in beta and the interface may change. Not all Vesuvius Challenge data is currently available - data will continue to be added to the library.
To get started, we recommend these notebooks that jump right in:
š Scroll Data Access: an introduction to accessing scroll data using a few lines of Python!
āļø Ink Detection: load and visualize segments with ink labels, and train models to detect ink in CT.
š§© Volumetric instance segmentation cubes: how to access instance-annotated cubes with the Cube
class, used for volumetric segmentation approaches.
vesuvius
does:vesuvius
doesn't do:vesuvius
can be installed with pip
.
Then, before using the library for the first time, accept the license terms:
$ pip install vesuvius
$ vesuvius.accept_terms --yes
The library can be imported in Python:
import vesuvius
To list the available files in the remote repository, use the following code:
from vesuvius import list_files
files = list_files()
The output of list_files
is a dictionary that contains the paths to all the scroll volumes and segment surface volumes available in the data repository. The dictionary structure is as follows:
scroll_id
.scroll_id
, there are keys for different energy
levels.energy
, there are keys for different resolution
levels.resolution
, there are keys for either segments
or volume
.segments
can contain segment_id
s.
Here is a visual representation of what the dictionary can look like:
{
'scroll_id1': {
'energy1': {
'resolution1': {
'segments': {
'segment_id1': 'path/to/segment_id1',
'segment_id2': 'path/to/segment_id2'
},
'volume': 'path/to/volumes'
},
'resolution2': {
'segments': {
'segment_id1': 'path/to/segment_id1',
'segment_id2': 'path/to/segment_id2'
},
'volume': 'path/to/volumes'
}
},
'energy2': {
'resolution1': {
'segments': {
'segment_id1': 'path/to/segment_id1',
'segment_id2': 'path/to/segment_id2'
},
'volume': 'path/to/volume'
},
}
},
'scroll_id2': {
'energy1': {
'resolution1': {
'segments': {
'segment_id1': 'path/to/segment_id1',
'segment_id2': 'path/to/segment_id2'
},
'volume': 'path/to/volumes'
},
}
}
}
This structure allows you to access specific paths based on the scroll_id
, energy
, resolution
, and segment_id
of the data you are interested in. This function is automatically executed when the library is imported to constantly keep the list of available files updated.
To list the available instance annotated volumetric cubes:
from vesuvius import cubes
available_cubes = cubes()
Similarly to list_files
the output of cubes
is a dictionary:
{
'scroll_id1': {
'energy1': {
'resolution1': {
'z1_y1_x1': 'path/to/z1_y1_x1',
'z2_y2_x2': 'path/to/z2_y2_x2'
}
}
}
}
z_y_x
are the coordinates in the relative scroll volume of the origin of the reference frame of the selected cube.
Volume
The Volume
class is used for accessing volumetric data, both for scrolls and surface volume of segments.
from vesuvius import Volume
# Basic usage
scroll = Volume(type="Scroll1") # this is going to access directly the canonical scroll 1 volume
# Basic usage specifying scan metadata
scroll = Volume(type="scroll", scroll_id=1, energy=54, resolution=7.91) # if you want to access a non canonical volume, you have to specify the scan metadata
# With cache (works only with remote repository)
scroll = Volume(type="scroll", scroll_id=1, energy=54, resolution=7.91, cache=True)
# Deactivate/activate caching (works only with remote repository)
scroll.activate_caching() # Don't need to do this if loaded the volume with cache=True
scroll.deactivate_caching()
# With normalization
scroll = Volume(type="scroll", scroll_id=1, energy=54, resolution=7.91, normalize=True)
# Visualize which subvolumes are available
scroll.meta()
# To print meta at initialization, use the argument verbose=True
scroll = Volume(type="Scroll1", verbose=True)
# To access shapes of multiresolution arrays
subvolume_index = 3 # third subvolume
shape = scroll.shape(subvolume_index)
# To access dtype
dtype = scroll.dtype
# Access data using indexing
data = scroll[:, :, :, subvolume_index] # Access the entire third subvolume
# When only three or less indices are specified, you are automatically accessing to the main subvolume (subvolume_index = 0)
data = scroll[15] # equal to scroll[15,:,:,0]
data = scroll[15,12] # equal to scroll [15,12,:,0]
# Slicing is also permitted for the first three indices
data = scroll[20:300,12:18,20:40,2]
If you fully downloaded a scroll volume, or a segment, you can directly specify its local path on your device:
scroll = Volume(type="scroll", scroll_id=1, energy=54, resolution=7.91, domain="local", path="/path/to/54keV_7.91um.zarr")
You can access segments in a similar fashion:
from vesuvius import Volume
# Basic usage
segment = Volume("20230827161847") # access a segment specifying is unique timestamp
# Basic usage specifying scan metadata
segment = Volume(type="segment", scroll_id=1, energy=54, resolution=7.91, segment_id=20230827161847)
Volume(
type: Union[str, int],
scroll_id: Optional[int] = None,
energy: Optional[int] = None,
resolution: Optional[float] = None,
segment_id: Optional[int] = None,
cache: bool = True,
cache_pool: int = 1e10,
normalize: bool = False,
verbose: bool = True,
domain: str = "dl.ash2txt",
path: Optional[str] = None
)
Cube
The Cube
class is used for accessing segmented cube data.
from vesuvius import Cube
# Basic usage
cube = Cube(scroll_id=1, energy=54, resolution=7.91, z=2256, y=2512, x=4816, cache=True, cache_dir='/path/to/cache') # with caching
# if caching=True but cache_dir is not selected, the instances will be automatically saved in $HOME / vesuvius / annotated-instances
cube = Cube(scroll_id=1, energy=54, resolution=7.91, z=2256, y=2512, x=4816, cache=False) # without caching
# With normalization
cube = Cube(scroll_id=1, energy=54, resolution=7.91, z=2256, y=2512, x=4816, normalize=True)
# Deactivate/activate caching
cube.activate_caching(cache_dir=None) # or define your own cache_dir
cube.deactivate_caching()
# To access the volume and the masks
volume, mask = cube[:, :, :] # also works with slicing
Cube(
scroll_id: int,
energy: int,
resolution: float,
z: int,
y: int,
x: int,
cache: bool = True,
cache_dir: Optional[os.PathLike] = None,
normalize: bool = False
)
normalize
parameter normalizes the data to the maximum value of the dtype.Volume
constructor.