ComputationalCryoEM / ASPIRE-Python

Algorithms for Single Particle Reconstruction
http://spr.math.princeton.edu
GNU General Public License v3.0
46 stars 21 forks source link

`ImageSource.save()` desired behavior #745

Open chris-langfield opened 1 year ago

chris-langfield commented 1 year ago

What should be the default and optional behaviors for the save() method?

save() calls two functions (which are currently public): save_metadata() and save_images().

Some cases we might want to cover:

A sketch of possible behavior (no overwrite of original starfile allowed unless the file is physically removed)


src = Source(starfile_path, max_rows = 1000)
src.save("mysource_n1000.star")  # just save truncated but identical metadata, referencing the same `.mrcs` files

src.save(starfile_path) # error: overwriting original metadata file

src.downsample(64)
src.save(starfile_path) # error: overwriting original metadata file

os.rm(starfile_path)
src.save(starfile_path) # saves the starfile, but does not write any new data (downsample is lost). references same `mrcs` files but only 1000 of them

src.save("my_source_DS64.star") # saves new images as `my_source_DS64_0_511.mrcs, my_source_DS64_512_1023.mrcs` etc

src.save("my_source_DS64.star", overwrite=True) # deletes all old `mrcs` saves new images as `my_source_DS64_0_511.mrcs, my_source_DS64_512_1023.mrcs` etc
garrettwrong commented 1 year ago

Thanks for writing down this discussion!

Confused about one. In this example, why would this case not write new mrcs?

src.save(starfile_path) # saves the starfile, but does not write any new data (downsample is lost). references same mrcs files but only 1000 of them

Is there any reason to think all the cases can't be accomplished by something like the following?

def save(..., overwrite=False):
   self.save_meta_data(..., overwrite)
   self.save_mrcs(..., overwrite)
chris-langfield commented 1 year ago

Confused about one. In this example, why would this case not write new mrcs?

src.save(starfile_path) # saves the starfile, but does not write any new data (downsample is lost). references same mrcs files but only 1000 of them

Good point.. but it probably shouldn't be with the same name as the old mrcs..

Is there any reason to think all the cases can't be accomplished by something like the following?

def save(..., overwrite=False):
   self.save_meta_data(..., overwrite)
   self.save_mrcs(..., overwrite)

Wouldn't this get rid of the possibility of saving a new starfile with the same data? I guess we can allow save_metadata() to be public for that. I don't think save_mrcs should be public because we need to be able to link the saved image data to a star file somewhere otherwise it's useless. If someone saves image data, but then changes the metadata somehow and saves the starfile separately, it might not load properly afterwards

garrettwrong commented 1 year ago

Good point.. but it probably shouldn't be with the same name as the old mrcs..

I guess overwrite would guard us there.

Regarding your second point. I guess that is probably fair in the context of saving a Source. If a user really wants to just save off mrcs they could do Source.images[0:512].save(fname) or something I think.

So

def save(..., overwrite=False):
   self.save_meta_data(..., overwrite)
   self._save_mrcs(..., overwrite)

?

And I think the filename stem should be user configurable. We might want to manage a save directory for the mrcs, the mate of whats called data_dir during init. Hrmm, but maybe that adds too much complexity.

@janden did you have any comments or concerns regarding behavior of Source.save? This discussion and issue came up after Chris found some strange vestigial meta save code in #732 . Thanks!