The-OpenROAD-Project / OpenROAD

OpenROAD's unified application implementing an RTL-to-GDS Flow. Documentation at https://openroad.readthedocs.io/en/latest/
https://theopenroadproject.org/
BSD 3-Clause "New" or "Revised" License
1.42k stars 496 forks source link

Accessing a destroyed database from python should result in an exception, not SEGFAULT #4855

Open oharboe opened 3 months ago

oharboe commented 3 months ago

Description

Ref. https://github.com/The-OpenROAD-Project/OpenROAD/pull/4802, producing a clear exception instead of a SEGFAULT when accessing a destroyed database from Python is the idiomatic way to handle things in Python.

Tasks:

Suggested Solution

Throw exception instead of SEGFAULT, which is congruent with an idiomatic Python interface.

Additional Context

No response

maliberty commented 3 months ago

Adding checking onto every single API call would be prohibitive in runtime and hard to do in practice. I don't see any practical solution to this. I think this is the nature of calling into c++ directly from Python - you have to play by c++ rules.

oharboe commented 3 months ago

Adding checking onto every single API call would be prohibitive in runtime and hard to do in practice. I don't see any practical solution to this. I think this is the nature of calling into c++ directly from Python - you have to play by c++ rules.

Runtime I dont believe will be affected materiallym Hard to do, I can believe, though maybe smartptr can fix that? It is idomatic Python interface not to SEGFAULT.

maliberty commented 3 months ago

I agree it is a good goal but I see no practical implementation. I'm open to suggestions.

oharboe commented 3 months ago

I agree it is a good goal but I see no practical implementation. I'm open to suggestions.

I tried to find the code, can you give some pointers to how this fits together?

maliberty commented 3 months ago

Its generated by swig from odb/src/swig/...

oharboe commented 3 months ago

I have never used SWIG, so I'm have trouble navigating it.

Maybe this is easier than it looks. I tried to create a simple test, it throws an exception, doesn't SEGFAULT, there is a check on None(null pointer) somewhere....

from openroad import Design
db = Design.createDetachedDb()
db.destroy(db)
db.getChip().getBlock().getNets()
$ openroad -python test.py
OpenROAD v2.0-12722-g25e88dd66 
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
[WARNING ORD-0039] .openroad ignored with -python
Traceback (most recent call last):
  File "/home/oyvind/OpenROAD-flow-scripts/tools/OpenROAD/test.py", line 4, in <module>
    db.getChip().getBlock().getNets()
    ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'getBlock'
>>> 
maliberty commented 3 months ago

Try

from openroad import Design
import odb
db = Design.createDetachedDb()
odb.dbChip.create(db)
db.destroy(db)
db.getChip().getBlock().getNets()
oharboe commented 3 months ago

So I can reproduce the problem:

oyvind@corona:~/OpenROAD-flow-scripts/tools/OpenROAD$ cat test/no-crash.py 
from openroad import Design
import odb
db = Design.createDetachedDb()
odb.dbChip.create(db)
db.destroy(db)
db.getChip().getBlock().getNets()
oyvind@corona:~/OpenROAD-flow-scripts/tools/OpenROAD$ openroad -python test/no-crash.py 
OpenROAD v2.0-12733-g09ba2a36a 
This program is licensed under the BSD-3 license. See the LICENSE file for details.
Components of this program may be licensed under more restrictive licenses which must be honored.
[WARNING ORD-0039] .openroad ignored with -python
Signal 11 received
Stack trace:
 0# 0x00005E9E23989703 in openroad
 1# 0x00007B7098E42990 in /lib/x86_64-linux-gnu/libc.so.6
 2# odb::dbDatabase::getChip() in openroad
 3# 0x00005E9E255347E8 in openroad
 4# 0x00007B70997C7B46 in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
 5# PyObject_Vectorcall in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
 6# _PyEval_EvalFrameDefault in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
 7# 0x00007B70999CF0FC in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
 8# PyEval_EvalCode in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
 9# 0x00007B70998B4F99 in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
10# _PyRun_SimpleFileObject in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
11# _PyRun_AnyFileObject in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
12# Py_RunMain in /lib/x86_64-linux-gnu/libpython3.11.so.1.0
13# main in openroad
14# 0x00007B7098E28150 in /lib/x86_64-linux-gnu/libc.so.6
15# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
16# _start in openroad
Segmentation fault (core dumped)
macd commented 3 months ago

Just an info comment here... Exceptions raised in C++ get passed up to Python by SWIG. ( see src/Exception-py.i ) so the C++ code here is segfaulting which is hard to recover from. When you delete an object in C++ and the Python wrapper does not know about it, bad things will happen. Another way to do this that would be safer is to use del db in Python rather than db.destroy(db) The Python db object includes a method __swig_destroy__ which calls delete_dbDatabase, so maybe the better approach is to use Python to delete these Python objects and trust (and verify) that the correct cleanup code is triggered in C++

maliberty commented 3 months ago

@macd that would handle the db object itself but all the other object handles you might have would still remain invalid (eg a dbInst). Do you see a way to invalidate all references into the destroyed db?

macd commented 3 months ago

I don't immediately see how to do that (but let me ponder). It occurs to me that this must(?) also happen with the tcl interface. If so, what is done there? If not, maybe an approach we can copy?

maliberty commented 3 months ago

I expect it is the same in tcl

rovinski commented 3 months ago

Similar question https://stackoverflow.com/questions/10068576/delete-an-object-and-all-references-to-it-in-python

My two thoughts are either 1) have a callback on destroy() which will go into the SWIG interface and invalidate any related objects, or 2) remove destroy() from the user view and force using python mechanisms to delete the object.

Or I guess there is also 3) which is force destroy() to invalidate all python objects before calling the the underlying C++ (similar to 1).

maliberty commented 3 months ago

@rovinski do you know how to do (1)?

I'm not clear how (2) would help as it is the other objects that are affected in ways that python doesn't understand.

rovinski commented 3 months ago

After further reading, I think the best solution is to simply disable calling destroy() from Python. As @macd says, del db is the safer way to do it. In Python, all objects have a reference counter and when the reference counter is 0, the garbage collector will automatically destroy the object.

In this case, it will be up to the user to del, or otherwise allow to go out of scope, all references to the db or db objects.

So in an example (Assume there is debug code that says when the db is deleted):

>>> from openroad import Design
>>> import odb
>>>
>>> db = Design.createDetachedDb()
>>> odb.dbChip.create(db)
>>> del db
Deleted db object
>>> db.getChip().getBlock()
Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
NameError: name 'db' is not defined

In another example:

>>> from openroad import Design
>>> import odb
>>>
>>> db1 = Design.createDetachedDb()
>>> # Second reference
>>> db2 = db1
>>> odb.dbChip.create(db)
>>> del db1
>>> db2.getChip().getBlock()
<odb.dbBlock object at 0x7fc9a9e36d60>
>>> # Remove last reference
>>> del db2
Deleted db object
>>> db2.getChip().getBlock()
Traceback (most recent call last):
  File "<stdin>", line 12, in <module>
NameError: name 'db2' is not defined

I think the only question is if a reference to a child object counts as a reference to the db, and if not, how to make it count as one.

rovinski commented 3 months ago

The weakref package also looks interesting: https://docs.python.org/3/library/weakref.html

maliberty commented 3 months ago

The solution assumes all calls come through db. You can still fail with

block = db.getChip().getBlock()
del db
block.getName()
rovinski commented 3 months ago

I believe that block = db.getChip().getBlock() is supposed to add a reference count to db. Therefore, del db will not actually delete the db. You would have to do del block in order for the db to actually be deleted.

rovinski commented 3 months ago

Alternatively, block could be a weakref, meaning that del db would in fact invalidate block.