brisvag / blik

Python tool for visualising and interacting with cryo-ET and subtomogram averaging data.
https://brisvag.github.io/blik/
GNU General Public License v3.0
23 stars 8 forks source link

Autoscaling issues #71

Closed brisvag closed 3 years ago

brisvag commented 3 years ago

69 Introduces scaling based on pixel size. Sometimes autoscaling just won't work, and having an interface to override it may be useful. Otherwise, we may want to add some flags to the io stuff to specify what data is being read.

Use case: in CC volumes from Warp, particle positions are scaled to [0, 1], and need 2 transformations to match the "original" image. In short:

positions *= image.shape[::-1]  # to undo the deformation
positions *= image.pixel_size

what's the best approach to solve problems like this?

alisterburt commented 3 years ago

agree this is hard to handle elegantly - especially when some of the data required for the transforms lives on the image whilst the rest is on the particles.

My feeling is that our first priority should be to spit out data blocks containing valid data from each file read. (possibly with lazy loading for things like images!)

Things like combining that data based on what other data is present should (I think) be left to specific tools which either operate on the dataset/datablocks, not flags in the io. Otherwise we quickly run into having to deal with a ridiculous number of cases (warp template matching particles without cc volumes? warp template matching particles with cc volumes etc)

This specific case seems like a good opportunity for an alchemist, what do you reckon?

brisvag commented 3 years ago

I agree on everything, but I'm not sure if an alchemist fits here... setting a value to the pixel_size attribute seems hardly worthy of an alchemist!

alisterburt commented 3 years ago

Yes agreed - pixel size should be an attribute which can directly be set but the transform which rescales from the image coordinates is what I was thinking could be delegated to an alchemist

brisvag commented 3 years ago

I think we have a different idea of what an alchemist should be. I see it as a function:

[datablocks] => alchemist => [datablocks]

You seem to imply that you would want it to edit the pizel_size attribute in place. I think we should avoid this stuff, or use something else for it.

alisterburt commented 3 years ago

Hmm, my mental model of it maps onto yours - you would give it an imageblock and a particle block, it would then add the pixel size info and transform the coordinates, returning a copy which you could assign to the same variable name?

On 5 Mar 2021, at 12:31, Lorenzo Gaifas notifications@github.com wrote:

 I think we have a different idea of what an alchemist should be. I see it as a function:

[datablocks] => alchemist => [datablocks] You seem to imply that you would want it to edit the pizel_size attribute in place. I think we should avoid this stuff, or use something else for it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

brisvag commented 3 years ago

Actually, the way I imagine alchemists acting is a more "live conversion" kind of approach, where they connect input to output and keep it updated. Using alchemists to do nothing to the data and do something on the metadata of a block seems a bit wasteful (both "ideally" and in terms of memory/computation. I think I have a few conflicting ideas on what an alchemist should be/do... Cause I also would like to enforce some level of immutability on datablocks...

alisterburt commented 3 years ago

oh interesting, I hadn't understood the 'live conversion' aspect. I like the idea of immutability on the datablock (at least the data attribute), but in that case we definitely need something which does the rescaling by the length of each dim for warp star files and returns a new block? Or are you suggesting to keep the data in the range [0,1] and do all rescaling on the visualisation side?

If suggesting that, it's good to keep in mind that some things (like the transformblock) will only work if the positions are not 'normalised'

brisvag commented 3 years ago

I think we should treat the pixel scale as part of the data. But not apply it to the data on load (for consistency if we want to then write things out again). However, I also thingk that we should use pixel size for any operation that calls for it, not just depiction.

For example, transform block should use the data * pixel_size to do its calculations.

It was in this context that I like the idea of handling all of this in IO. On the other hand, if implemented right, we should be able to just change pixel size at any time, and things just just work!

alisterburt commented 3 years ago

pixel spacing is slightly separate from rescaling by the dimensions - I do like the idea of being able to change the spacing but it complicates things a bit.

If raw data comes in a weird format that we're unlikely to directly want again (0,1 template matching, for example) then I think this should be handled on import. Pixel spacing I think can be handled like you suggest but I would prefer that that be handled once on the side of the ImageBlock rather than on the transformblock and any other block that might come along. Probably just as a 'scaled_coordinates' type attribute?

brisvag commented 3 years ago

A few things:

alisterburt commented 3 years ago

I agree on everything :) the normalised coordinates definitely make it not clean because that info isn't in the files and I don't like the idea of having multiple parameters to handle it, I also don't like melding pixel size and that scale factor into one superparameter so I'm not sure what the right way to go is...

I do think the need for rescaling if the data fall in the range 0, 1 on import from a star file is a safe assumption, not so safe when creating your own particle block, would have to be careful

I like the raw_data and data, data could be a property which just returns the data if any scale params aren't set to avoid unnecessary computation too!

brisvag commented 3 years ago

Autoscaling is now handled only on visualization, depending on the presence of sister blocks. These issues are still relevant (can we always assume data between 0 and 1 wants to be rescaled?), but I think we can close this and reopen if we eve have issues.