KLayout / klayout

KLayout Main Sources
http://www.klayout.org
GNU General Public License v3.0
754 stars 196 forks source link

ERROR: src/db/db/dbPCellHeader.cc,137,v != m_variant_map.end () #1782

Closed lukasc-ubc closed 6 hours ago

lukasc-ubc commented 1 week ago

Hi @klayoutmatthias

I am receiving this error after running a long script that generates a large layout. I am running the script using an external python environment, using "import pya". The error fortunately occurs after I save my layout, upon completion of the script. I have seen it come and go for the past few hours. I'm sorry I don't have a minimum example to share that reproduces this error. I am hoping you may have some suggestion on how to debug. The script includes calling PCells that I have created.

ERROR: src/db/db/dbPCellHeader.cc,137,v != m_variant_map.end ()

thank you

klayoutmatthias commented 1 week ago

Hi Lukas,

the message basically means that there is a second incarnation of the same PCell (same parameters) in the library. This should not happen as the PCellHeader manages the variants.

You may be able to spot this case by looking at the raw cell names in the layout and checking if there are two raw cells for the same parameter set.

One possible scenario which may trigger the problem is re-registration of a PCell. So if a library for some reason re-registers a PCell while there are PCell instances already present for the original PCell, this error could happen upon cleanup of the layout.

There is no log one could easily enable to detect this problem.

The code responsible for the above scenario is in Layout::register_pcell (dbLayout.cc:2527). I'd throw an exception there and see if that triggers somewhere. It is does, there is a case where a PCell is registered again with the same name in the same library.

I have to investigate this scenario. Actually I think this may be responsible for some issues we find when developing PCells inside the application.

However, this is only a hypothesis. It may be something else as well.

Matthias

lukasc-ubc commented 1 week ago

Thanks Matthias.

My script only loads the library once, and I don't re-register it.

But indeed, I issue ly.create_cell many times, and there are repeats with the same parameters. The curious thing is that my first loop generates 17 cells, and those 17 are repeated in a second loop which generates 66 cells. Yet, I receive the error message 5 times when closing.

I'll keep an eye out for this, and see if it comes up again, ideally in a smaller project.

klayoutmatthias commented 1 week ago

Hi Lukas,

Thanks for the explanation. Could you elaborate a little on what you mean by "those 17 are repeated in a second loop"? Does that mean a (deep) copy or just instances of that cell? Or something else?

For now I was able to reproduce the problem with a code that deliberately re-registers a PCell:

# test.py
import pya
import math

class Circle(pya.PCellDeclarationHelper):
  def __init__(self):
    super(Circle, self).__init__()
    self.param("l", self.TypeLayer, "Layer", default = pya.LayerInfo(1, 0))
    self.param("r", self.TypeDouble, "Radius", default = 1.0)
    self.param("n", self.TypeInt, "Number of points", default = 16)     

  def display_text_impl(self):
    return "Circle(L=" + str(self.l) + ",R=" + ('%.3f' % self.r) + ")"

  def produce_impl(self):
    da = math.pi * 2 / self.n
    pts = [ pya.DPoint(self.r * math.cos(i * da), self.r * math.sin(i * da)) for i in range(0, self.n) ]
    self.cell.shapes(self.l_layer).insert(pya.DPolygon(pts))

class CircleLib(pya.Library):
  def __init__(self, name):
    self.description = "Circle Library"
    self.layout().register_pcell("Circle", Circle())
    self.register(name)

  def reregister_pcell(self):
    self.layout().register_pcell("Circle", Circle())

lib = CircleLib("CircleLib")

ly = pya.Layout()

top = ly.create_cell("TOP")

c = ly.create_cell("Circle", "CircleLib", { "l": pya.LayerInfo(1, 0), "r": 2.0, "n": 32 })
top.insert(pya.DCellInstArray(c, pya.DTrans(0.0, 0.0)))

# This triggers
# ERROR: ../../../src/db/db/dbPCellHeader.cc,137,v != m_variant_map.end ()
lib.reregister_pcell()

c = ly.create_cell("Circle", "CircleLib", { "l": pya.LayerInfo(1, 0), "r": 2.0, "n": 32 })
top.insert(pya.DCellInstArray(c, pya.DTrans(0.0, 10.0)))

ly.write("out.gds")
print("Layout written.")

Which gives:

matthias@beast:~/klayout/testdata/issue-1782$ klayout -b -r test.py
Layout written.
ERROR: ../../../src/db/db/dbPCellHeader.cc,137,v != m_variant_map.end ()
terminate called after throwing an instance of 'tl::InternalException'
addr2line: 'klayout': No such file
ERROR: Signal number: 6
Address: 0x3e800110d27
Program Version: KLayout 0.29.1 (2024-05-04 rf95aef89d)

Backtrace:
/usr/lib/klayout/libklayout_lay.so.0 +0x2ef032 lay::enable_signal_handler_gui(bool) [??:?]
/lib/x86_64-linux-gnu/libc.so.6 +0x42520 __restore_rt [libc_sigaction.c:?]
/lib/x86_64-linux-gnu/libc.so.6 +0x969fc __pthread_kill_implementation [pthread_kill.c:44]
...

The crash is due to an uncaught exception - the error happens in the Python shutdown code which is outside exception handling.

But being able to reproduce it, does not mean I know what is going on :(

I can basically make re-registration a valid operation and turn the assertion into a warning. That should prevent the crash. That might not be the real cure however.

Matthias

lukasc-ubc commented 1 week ago

Hi Matthias,

1) "repeated in a second loop” — what I am doing is calling cell = ly.create_cell(…, …, {…}) multiple times, and some of the times the same parameters appear in the call. Then I instantiate the cell. I am not doing any deep copying.

2) Your crash is much more severe than mine. See screenshot — mine is only an error message.

image

lukasc-ubc commented 1 week ago

Hi Matthias,

I discovered the source of the error.

We discovered this when two PCell calls resulted in the same output, despite having different inputs. One PCell call had a parameter with a numerical value, while the other had the same parameter being a float('nan'). The PCell was expecting a self.TypeDouble, and Python is happy with 'nan' and 'inf'.

I was passing a PCell parameter: float('nan') when I wanted to skip a certain feature.

Inside the PCell, I would check. But I notice that you can't compare:

>>> float('nan') == float('nan')
False

So I think the PCell wrapper wasn't handling nan correctly. I replace 'nan' with 1e6 and check for that instead, and now the errors are gone.

I also checked for infinity, float('inf'), and it also produces the same errors. So inf is also not handled properly.

Perhaps you could add a check in PCell declaration helper for self.TypeDouble, to disallow 'nan' and 'inf'? Unless there is a way to enable 'nan'?

thank you

klayoutmatthias commented 1 week ago

Hi Lukas,

thanks for this analysis. I guess that pretty well explains the weird effects. I never looked into the compare behavior of NaN in C++, but al least, strict weak ordering - as mandated by the STL sets and maps I use - is probably not the given.

I will try to debug the problem. It is probably easy to fix.

Best regards,

Matthias

klayoutmatthias commented 1 week ago

I tried to reproduce the problem with "nan", but without much success. I can confirm however, that different parameter sets give the same cell and maybe that is producing issues once the std::map structure is broken.

So I think that is enough for a tentative fix.

BTW: I noticed, that the GDS file written with the "nan" parameter is broken - reading it errors out with

ERROR: Expected a real number here: nan

So NaN isn't a good idea in general.

Basically, there is not much type checking between PCell client script and PCell code: if you pass parameters by script, you can essentially use "None" as a value of "TypeDouble" and receive "None" inside you code. However, when you edit such a layout, you will see a value of "0" instead of "None". All that is needed is some "optional" attribute for the PCell parameters and there could be an empty edit box for "None" in that case.

Best regards,

Matthias

klayoutmatthias commented 1 week ago

Got it. Here is my code to reproduce it:


import pya
import math
import random

class Circle(pya.PCellDeclarationHelper):
  def __init__(self):
    super(Circle, self).__init__()
    self.param("l", self.TypeLayer, "Layer", default = pya.LayerInfo(1, 0))
    self.param("r", self.TypeDouble, "Radius", default = 1.0)
    self.param("n", self.TypeInt, "Number of points", default = 16)     

  def display_text_impl(self):
    r = self.r
    if r is None:
      r = "nil"
    else:
      r = '%.3f' % r
    return "Circle(L=" + str(self.l) + ",R=" + r + ")"

  def produce_impl(self):
    r = self.r
    if str(self.r) == 'nan':
      r = 2.0
    da = math.pi * 2 / self.n
    pts = [ pya.DPoint(r * math.cos(i * da), r * math.sin(i * da)) for i in range(0, self.n) ]
    self.cell.shapes(self.l_layer).insert(pya.DPolygon(pts))

class CircleLib(pya.Library):
  def __init__(self, name):
    self.description = "Circle Library"
    self.layout().register_pcell("Circle", Circle())
    self.register(name)

  def reregister_pcell(self):
    self.layout().register_pcell("Circle", Circle())

lib = CircleLib("CircleLib")

ly = pya.Layout()

top = ly.create_cell("TOP")

for i in range(0, 5):
  for j in range(0, 5):

    if random.random() > 0.5:
      r = float('nan')
    else:
      r = random.random() * 3
    n = int(random.random() * 25 + 8)

    c = ly.create_cell("Circle", "CircleLib", { "l": pya.LayerInfo(1, 0), "r": r, "n": n })
    top.insert(pya.DCellInstArray(c, pya.DTrans(i * 10.0, j * 10.0)))

ly.write("out.gds")
print("Layout written.")

ly._destroy

And again I get the Abort signal because of the uncaught exception.

Anyway, there is enough material for debugging. I think the NaN's and inf's need special care. And I will also address the re-registration issue as this is possibly responsible for program crashes during PCell development.

Thanks for your help and for reporting the issue.

Best regards,

Matthias