Converting Fits to Particle Pick Locations

jgermain22 commented 3 months ago

Hello! I was wondering what the angel convention is when saving the 'protein locations'?

Also I was wondering if it would be possible to to add functionality to convert the membrane fits to oversampled particle pick locations. Basically making the fits into a grid with a defined sampling then getting the normal vectors for downstream STA. Right now I am using this package (https://github.com/EuanPyle/Membrane_Associated_Picking) but it would be amazing if this kind of functionality was available in colabseg! Thank you again for making a great tool!! Best Jason

maurerv commented 3 months ago

Hi @jgermain22,

Thanks for the feedback :) Could you please specify more what you mean by "saving the protein locations"?

The feature you requested is (sort of) implemented but not part of the GUI yet, partially because I am not sure which format would be best for integration with subsequent software. Personally, I needed STAR files for my workflow, and I used the following code to create surface oversampling inputs from the collapsed parameterizations.

Star file writer

``` def write_star( filename: str, translations, rotations, name_prefix: str = None, ctf_image: str = None, sampling_rate: float = 1.0, subtomogram_size: int = 0, ) -> None: """ Save orientations in RELION's STAR file format. Parameters ---------- filename : str The name of the file to save the orientations. name_prefix : str, optional A prefix to add to the image names in the STAR file. ctf_image : str, optional Path to CTF or wedge mask RELION. sampling_rate : float, optional Subtomogram sampling rate in angstrom per voxel subtomogram_size : int, optional Size of the square shaped subtomogram. Notes ----- The file is saved with a standard header used in RELION STAR files. Each row in the file corresponds to an orientation. """ optics_header = [ "# version 30001", "data_optics", "", "loop_", "_rlnOpticsGroup", "_rlnOpticsGroupName", "_rlnSphericalAberration", "_rlnVoltage", "_rlnImageSize", "_rlnImageDimensionality", "_rlnImagePixelSize", ] optics_data = [ "1", "opticsGroup1", "2.700000", "300.000000", str(int(subtomogram_size)), "3", str(float(sampling_rate)), ] optics_header = "\n".join(optics_header) optics_data = "\t".join(optics_data) header = [ "data_particles", "", "loop_", "_rlnCoordinateX", "_rlnCoordinateY", "_rlnCoordinateZ", "_rlnImageName", "_rlnAngleRot", "_rlnAngleTilt", "_rlnAnglePsi", "_rlnOpticsGroup", ] if ctf_image is not None: header.append("_rlnCtfImage") ctf_image = "" if ctf_image is None else f"\t{ctf_image}" header = "\n".join(header) name_prefix = "" if name_prefix is None else name_prefix with open(filename, mode="w", encoding="utf-8") as ofile: _ = ofile.write(f"{optics_header}\n") _ = ofile.write(f"{optics_data}\n") _ = ofile.write("\n# version 30001\n") _ = ofile.write(f"{header}\n") # pyTME uses a zyx data layout for index in range(translations.shape[0]): translation = translations[index] rotation = rotations[index] translation_string = "\t".join([str(x) for x in translation]) angle_string = "\t".join([str(x) for x in rotation]) name = f"{name_prefix}_{index}.mrc" _ = ofile.write( f"{translation_string}\t{name}\t{angle_string}\t1{ctf_image}\n" ) return None ```

Rotation handling code I copied from pyTME (https://github.com/KosinskiLab/pyTME/blob/296c1a58e958a48782edad2fdfbceae4cd323858/tme/matching_utils.py#L906)

Rotation handling

``` from typing import Tuple from numpy.typing import NDArray from scipy.spatial.transform import Rotation def rotation_aligning_vectors( initial_vector: NDArray, target_vector: NDArray = [1, 0, 0], convention: str = None ): """ Compute the rotation matrix or Euler angles required to align an initial vector with a target vector. Parameters ---------- initial_vector : NDArray The initial vector to be rotated. target_vector : NDArray, optional The target vector to align the initial vector with. Default is [1, 0, 0]. convention : str, optional The generate euler angles in degrees. If None returns a rotation matrix instead. Returns ------- rotation_matrix_or_angles : NDArray or tuple Rotation matrix if convention is None else tuple of euler angles. """ initial_vector = np.asarray(initial_vector, dtype=np.float32) target_vector = np.asarray(target_vector, dtype=np.float32) initial_vector /= np.linalg.norm(initial_vector) target_vector /= np.linalg.norm(target_vector) rotation_matrix = np.eye(len(initial_vector)) if not np.allclose(initial_vector, target_vector): rotation_axis = np.cross(initial_vector, target_vector) rotation_angle = np.arccos(np.dot(initial_vector, target_vector)) k = rotation_axis / np.linalg.norm(rotation_axis) K = np.array([[0, -k[2], k[1]], [k[2], 0, -k[0]], [-k[1], k[0], 0]]) rotation_matrix = np.eye(3) rotation_matrix += np.sin(rotation_angle) * K rotation_matrix += (1 - np.cos(rotation_angle)) * np.dot(K, K) if convention is None: return rotation_matrix angles = euler_from_rotationmatrix(rotation_matrix, convention=convention) return angles def euler_from_rotationmatrix( rotation_matrix: NDArray, convention: str = "zyx" ) -> Tuple: """ Convert a rotation matrix to euler angles. Parameters ---------- rotation_matrix : NDArray A 2 x 2 or 3 x 3 rotation matrix in z y x form. convention : str, optional Euler angle convention. Returns ------- Tuple The generate euler angles in degrees """ if rotation_matrix.shape[0] == 2: temp_matrix = np.eye(3) temp_matrix[:2, :2] = rotation_matrix rotation_matrix = temp_matrix euler_angles = ( Rotation.from_matrix(rotation_matrix) .as_euler(convention, degrees=True) .astype(np.float32) ) return euler_angles ```

After you copied the helper functions above to your notebook, you can perform surface oversampling like so:

data = gui.data_structure
models = data.cluster_list_fits_objects

# Average distance of sampled points in units of sampling rate
sampling_rate = 15

# Pixel size in units Unit / Voxel (typically unit is angstrom)
pixel_size  = 6.72

# The parametrization is fit to the center of the membrane (minimal error)
# If your downstream tool requires having points on the outer leaflet, you
# can set this half the membrane thickness in unit (typically angstrom)
membrane_offset = 9.5

# Path to output star files
output_prefix = "/path/to/output"

for index, model in enumerate(data.cluster_list_fits_objects):
    old_radii = model.radii
    model.radii = np.add(model.radii, np.multiply(membrane_offset, data.pixel_size))

    # Compute number of required to achieve a given spatial sampling rate
    n_samples = model.points_per_sampling(sampling_rate * pixel_size / 2)

    # Sample points and corresponding normals from parametrization
    points = model.sample(n_samples, sample_mesh = True, mesh_init_factor=5)
    normals = model.compute_normal(points)

    # Convert sampled points into voxel space
    points = np.divide(points, pixel_size).astype(int)

    zyz_normals = np.array([
        rotation_aligning_vectors(normals[index], convention="zyz") 
        for index in range(normals.shape[0])
    ])
    write_star(
        f"{output_prefix}_{index}.star",
        translations=points,
        rotations=zyz_normals
    )
    model.radii = old_radii

Hope that gives you an idea where to start and feel free to reach out in case of issues :)

Best, Valentin

jgermain22 commented 3 months ago

Thank you so much! Is this Relion 4 or Relion 5 coordinates?

maurerv commented 3 months ago

I have only used this with Relion 4 so far. Going over the release notes, version 5 appears to be compatible with these star files though

jgermain22 commented 3 months ago

Relion 5 coordinate system is quite different it seems as they set 0,0 as the center of the tomogram rather than the IMOD convention. It looks like its an easy conversion but the coordinate system is in Angstrom rather than pix which is odd. I havent tried it myself but I will soon. https://github.com/3dem/relion/issues/1137

maurerv commented 3 months ago

Ah, yes, you will need to add the shift.

gui.data_structure.shape and gui.data_structure.pixel_size keep track of the tomogram box size and corresponding voxel size along each axis. Those can be used to compute the scaling factor mentioned in the issue you linked :)

Something like this should do the trick

center = np.multiply(
    np.divide(gui.data_structure.shape, 2) - 1, 
    gui.data_structure.shape.pixelsize
)
points = points - center
# Points are in angstrom already so omit
# points = np.divide(points, pixel_size).astype(int)

jgermain22 commented 2 months ago

Sure Ill give this a try! Are you planning on incorporating this into the GUI?

maurerv commented 2 months ago

Yes, this will be added to the GUI :)

However, the timeline isn't set yet, as we are also working on other features, like support for triangular meshes on top of the available parametrizations

KosinskiLab / colabseg

Converting Fits to Particle Pick Locations #7