BodenmillerGroup / readimc

Python package for reading imaging mass cytometry (IMC) files
https://bodenmillergroup.github.io/readimc
MIT License
12 stars 6 forks source link

Support for reading directly from zip files #25

Open matt-sd-watson opened 10 months ago

matt-sd-watson commented 10 months ago

Does the readimc API for MCDFile allow for reading mcd files directly from a zipped state? Our group often compresses these files to save space, and this functionality would be great for faster data processing.

jwindhager commented 10 months ago

Hi @matt-sd-watson,

This is currently not supported, but I think it would be a great addition. Tagging @Milad4849 here, the current maintainer.

With that being said, MCDFile just operates with standard file handles, so you could try to extend the MCDFile class yourself:

class MyMCDFile(MCDFile):
    def open(self) -> None:
        if self._path.name.suffix == ".zip":        
            self._fh = ZipFile(self._path).open("myfile.mcd", mode="rb")
            self._schema_xml = self._read_schema_xml()
            self._slides = MCDParser(self._schema_xml).parse_slides()
        else:
            super().open()  # call original MCDFile.open function

Note that this is untested. Not sure whether this will work as expected, especially because it will likely cause problems with memory mapping (e.g. here and here), but it may provide a starting point - I'm sure @Milad4849 would welcome pull requests! For example, one could think of modifying MCDFile.__init__ to also accept file handles, and only use memory mapping for uncompressed data.

Milad4849 commented 9 months ago

Indeed as @jwindhager mentions, pull requests are very welcome :)