chanzuckerberg / cryoet-data-portal

CryoET Data Portal
MIT License
16 stars 9 forks source link

Update S3 Ingestion workflow to support Alignment #898

Open manasaV3 opened 1 month ago

manasaV3 commented 1 month ago

Update the existing ingestion workflow to support capturing the new relevant metadata and updating the data structure as required.

The changes required for the entities are as follows:

Alignment

Field name data type source default computation logic if any
alignment_type string config metadata GLOBAL
volume_dimension dict
volume_dimension.(x,y,z) int related tomogram - if tomogram mrc, get nx,ny,nz from header / if tomogram zarr, get scale of 0 bin, The value should be multiplied with the voxel spacing to get the result in angstrom.
volume_offset dict
volume_offset.(x,y,z) int config metadata 0
x_rotation_offset float config metadata 0
tilt_offset float config metadata 0
affine_transformation_matrix 4X4 float matrix config metadata identity matrix
is_canonical bool config metadata false
alignment_file_path str - None
tilt_file_path str - None
deposition_id int config
per_section_alignment_parameters list
psap.[].z_index int TBD TBD TBD
psap.[].x_offset float xf file None columns[n-1]
psap.[].y_offset float xf file None columns[n]
psap.[].in_plane_rotation (float, float, float, float) xf file None columns[1:4]
psap.[].beam_tilt float TBD TBD TBD
psap.[].tilt_angle float tlt file None
psap.[].volume_x_rotation float tltx file 0

If there are no alignment files available, the per_section_alignment_parameters will be an empty list with the default values for the other fields.

Frames

Gains

junxini commented 1 month ago

@manasaV3 to break down into 2 tickets and point both

manasaV3 commented 1 week ago

beam_tilt to be supported as a part of the aln files for alignment. Not needed for this feature.

manasaV3 commented 20 hours ago

Update the names to .mdoc or .rawtlt respectively

We are not going to update the mdoc and rawtlt files to run name, as there are edge cases where we might have multiple of those files. So, we are gonna continue retaining the source file name as a part of their names.

manasaV3 commented 20 hours ago

Update the location of .rawtlt to /<dataset-id>/<run-name>/Frames

rawtlts files will continue to be stored in Tiltseries folder, to establish the relationship between them and the tiltseries.