KLayout / klayout

KLayout Main Sources
http://www.klayout.org
GNU General Public License v3.0
745 stars 192 forks source link

can read gds file with Memory-mapped file? #1707

Closed lmmir closed 1 month ago

lmmir commented 1 month ago

can read gds file with Memory-mapped file? Have you considered this before and could it improve read speed?

klayoutmatthias commented 1 month ago

GDS is not a format that benefits from memory mapping. Not at all.

The process of reading GDS into a (real) layout database is much more complex that just file I/O. To benefit from memory mapping you need a format that is a ready-to-use database. There are very few formats that fulfil this criterion, like certain flavors of OASIS.

Matthias

lmmir commented 1 month ago

GDS is not a format that benefits from memory mapping. Not at all.

The process of reading GDS into a (real) layout database is much more complex that just file I/O. To benefit from memory mapping you need a format that is a ready-to-use database. There are very few formats that fulfil this criterion, like certain flavors of OASIS.

Matthias

I tried switching to memory-mapped reading of gds files, and the reading time for a 175G file was reduced by about 300 seconds.

stefanottili commented 1 month ago

Its 2024, there is no reason to create 175GB gds files, just use oasis.

If you memory-map 175GB, you need 175GB of memory, but in memory layout databases will likely not only be using less then that, but also provide access to the data.

If you ask questions like this, please provide some context of what you’re trying to do. That would make answering much easier.

Did you read your file with klayout ? How much memory did it use ? How long did it take ? Was it a p&r chip or fractured mask data ? Are you trying to write your own tool ? To do what ?

lmmir commented 1 month ago

Its 2024, there is no reason to create 175GB gds files, just use oasis.

If you memory-map 175GB, you need 175GB of memory, but in memory layout databases will likely not only be using less then that, but also provide access to the data.

If you ask questions like this, please provide some context of what you’re trying to do. That would make answering much easier.

Did you read your file with klayout ? How much memory did it use ? How long did it take ? Was it a p&r chip or fractured mask data ? Are you trying to write your own tool ? To do what ?

Did you read your file with klayout ? yes, only modify InputStream and inputfile class and use mmap function map the file. How much memory did it use ? 50g (physical memory 160G)

How long did it take ? memory map is 813 seconds, no memory map is 1119 seconds, read the same file.

stefanottili commented 1 month ago

Well, write the files as oasis and check what runtime you get … I’m pretty sure it’s much faster.

klayoutmatthias commented 1 month ago

@lmmir,

I am missing a code patch that would allow confirming your data.

I know it is a common myth that memory mapping is somehow the better disk I/O method. But you're not doing memory mapping just for sake of disk I/O performance. In my case, data sources are sometime serial (pipe, sockets) and GDS almost always comes as .gz. Memory mapping will not help in these cases. Instead, I suspect it will create compatibility and maintenance issues on Windows for example. I also think there is a penalty for unmapping memory blocks in case the file does not fit into the available physical memory. Even more, if it competes with the memory allocated for the database. Memory mapping is closely related to disk swapping and that is exactly what it is for: to extend the physical memory by adding virtual memory from the disk.

A real benefit of memory mapping comes from a file format, that can readily be used as an in-memory image of a database. If that is the case, the database-building step can be skipped and opening a file is instantaneous. Strict-mode OASIS can, to some extent, be seen as such a format. You can read the tables and identify the location inside the file where to look at. That is even more true for OASIS.Mask which is made for semi-flat, tile-organized and non-overlapping geometry data. You can see that effect on specialized, commercial tools or outside the VLSI domain, such as on image processors for very large image files - for example for the geospatial domain. This usually means, that the file format and the database are built for each other. KLayout is not built that way as I try to maintain a universal engine not bound to a specific format.

And finally about GDS: this is a serial format. You have to read it from the first to the last byte to build the database. The database has to add a lot of details such as cell-instance links, search trees, bounding box information etc. No real gain from memory mapping.

Matthias