NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
52 stars 16 forks source link

redundant requirements in ophys.ROI #60

Closed neuromusic closed 4 years ago

neuromusic commented 6 years ago

the following attributes are required to define an ROI

    pix_mask (Iterable): List of pixels (x,y) that compose the mask.
    pix_mask_weight (Iterable): Weight of each pixel listed in pix_mask.
    img_mask (Iterable): ROI mask, represented in 2D ([y][x]) intensity image.

img_mask is redundant with pix_mask+pix_mask_weight

it should be required to define img_mask OR pix_mask+pix_mask_weight

bendichter commented 6 years ago

I agree, and I would prefer to just have img_mask and do away with pixel_mask entirely. If we really care about space efficiency, we can use gzip and the size should be about the same without much performance overhead.

oruebel commented 6 years ago

From an API perspective, I think it is fine to allow both ways. I.e., we could store the data in NWB:N always as either an img_mask or pixel mask but on the API side allow the user to get the data either way. I.e., removing either img_mask or pixel_mask from the schema doesn't mean you couldn't have a convenience function to translate one into the other for users.

With regard to storage. If the masks are sparse (i.e., have less than say 30% of pixel selected per image) then having a pix_mask would likely be more efficient. However, if the masks are dense, than the img_mask approach will be better. From a storage perspective, it is correct that you could compress the img_masks and probably get reasonable data sizes. The main problem I think here in terms of performance is more on the read/write performance and usage of memory when you need to have the data in memory.

Long story short.

  1. I think it is probably a good idea to remove this redundancy in the schema
  2. If you remove either img_mask or pix_mask from the schema then it would probably good idea in the API to add a simple convenience function to allow users to use either approach.
  3. For the schema I would base the decision of using img_mask or pix_mask on performance markers (memory and storage size and read/write speed)
jonc125 commented 6 years ago

For our use cases each ROI will generally be a very small fraction of the overall reference image.

rly commented 4 years ago

This issue has been resolved.

neuromusic commented 4 years ago

🎉