GeospatialPython / pyshp

This library reads and writes ESRI Shapefiles in pure Python.
MIT License
1.09k stars 259 forks source link

pyshp 2.3 closing bytesio objects with Writer #244

Closed akrherz closed 2 years ago

akrherz commented 2 years ago

PyShp Version

2.3.0

Python Version

3.10

Your code

shpio = BytesIO()
shxio = BytesIO()
dbfio = BytesIO()
with shapefile.Writer(shx=shxio, dbf=dbfio, shp=shpio) as shp:
     ...
zio = BytesIO()
with zipfile.ZipFile(
        zio, mode="w", compression=zipfile.ZIP_DEFLATED
    ) as zf:
        zf.writestr("fn.shp", shpio.getvalue())
        zf.writestr("fn.shx", shxio.getvalue())
        zf.writestr("fn.dbf", dbfio.getvalue())

Full stacktrace

zf.writestr("fn.shp", shpio.getvalue())
ValueError: I/O operation on closed file.

Other notes

af1dee9f7bf24687f0b appears to now close any file objects. Perhaps my usage is not a best practice for BytesIO objects, but even in a context manager, the new pyshp code will close the bytesio objects.

mcflugen commented 2 years ago

@akrherz I came here to file this exact issue. I'm also not sure if it's best practice but we use BytesIO objects in this same way you are.

It seems as though if shapefile.Writer is able to write to binary streams, then those streams should be accessible after writing to them. Otherwise, why write to them?

mcflugen commented 2 years ago

I'm not sure if this is the direction you would want to go but something like the following fixes this issue for me.

My thought is that if the Writer opens a file, then it should be responsible for closing it, but if the Writer's given an already open stream then it should be left to caller to close that stream.

One way to accomplish this is to have __getFileObj keep track of all the files the Writer opens,

    def __getFileObj(self, f):
        """Safety handler to verify file-like objects"""
        if not f:
            raise ShapefileException("No file-like object available.")
        elif hasattr(f, "write"):
            return f
        else:
            pth = os.path.split(f)[0]
            if pth and not os.path.exists(pth):
                os.makedirs(pth)
            fp = open(f, "wb+")
            self._files_to_close.append(fp)
            return fp

and then, instead of the for attribute in (self.shp, self.shx, self.dbf): loop, the close method would end in,

        while self._files_to_close:
            self._files_to_close.pop().close()

If there is ever an issue with streams not being flushed, perhaps the close method could call the file's flush method, if available?

        for attribute in (self.shp, self.shx, self.dbf):
            try:
                attribute.flush()
            except AttributeError:
                pass

Anyway, that's certainly not the most elegant solution but maybe there's something there?

karimbahgat commented 2 years ago

Thanks to both. So i did think about this when i implemented the change, with the idea that calling close() should indeed close fileobjects, and that all code relating to a context manager should happen within the context close. Nevertheless, i agree about your point that closing is most important for files that pyshp opens itself. I've also encountered the same problem and agree that writing to file objects is pointless if you can't read them out after, the problem being that closing was required to write and flush the final header info.

I'm a bit short on time so i just took your suggested code and implemented keeping track of opened files and only closing those, flushing all data regardless regardless of source, and added several more tests for this.

Can any of you confirm that the latest code works for your use case?

akrherz commented 2 years ago

Thanks @karimbahgat , I reviewed your commit and the change made sense to me. I tested it locally and it seems to be working as expected.

karimbahgat commented 2 years ago

Released now in v2.3.1.