Objects being overwritten from memory during tessellation

tdeyster commented 1 year ago

Problem:

When using the jupyter cadquery function jcq.base._tessellate_group(group_to_tessellate) sometimes objects are pulled from memory rather than the group_to_tessellate

I've tried numerous options including:

-make sure I'm deleting objects -try importing and deleting jcq functions inside each loop -try passing cq objects through jcq.cad_objects.to_assembly() first (rather than using jcq group) -running various IDE (cmd, spyder, jupyter)

My best guess if that somehow variables aren't being correctly disposed by jcq or python.
Though, if this is a system specific issue, I'd appreciate any suggestions.

Code to reproduce issue in jupyter notebooks

this code generates rows of boxes, if is find a box pulled from memory rather than the current object it breaks and displays. Note all block should be in a line, the block out of line is pulled from a previous loop iteration.

# imports
import cadquery as cq
import jupyter_cadquery as jcq
from cad_viewer_widget import show as viewer_show
import numpy as np
#versions
import sys
print("python: %s"%sys.version)
print("cadquery: %s (funny version number but was installed recently should be 2.?)"%cq.__version__)
print("jupyter_cadquery: %s"%jcq.jcq_version)

# generate boxes (may need to run this block a few times to see error)
ni,nj = 300,30 # loop through ni rows, each with nj boxes
for i in range(ni):
    boxes = [] #partlist
    for j in range(nj):
        x = np.random.rand() #box will be random size 0-1
        box = cq.Workplane("XY").transformed(offset=cq.Vector(2*i,j,0)).box(x,x,x)
        boxes.append(jcq.Part(box,name="box_%s"%j))
        del box

    result = jcq.PartGroup(boxes,name="boxes")
    del boxes
    # tessellate boxes
    shapes,states = jcq.base._tessellate_group(result)
    del result
    # get bb
    bb = jcq.base._combined_bb(shapes)
    shapes["bb"] = bb.to_dict()
    # check that bb is correct
    if bb.xmin<(2*i-.5) or bb.xmax>(2*i+.5):
        print(i-1)
        viewer_show(shapes,states) # shapes and states
        break

    del shapes,states

System info:

PC: Windows 10 64bit python: 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:41:22) [MSC v.1929 64 bit (AMD64)] cadquery: 0.0.0 (funny version number but was installed last month should be 2.?) jupyter_cadquery: 3.4.0

bernhard-42 commented 1 year ago

Thanks for reporting. I was able to reproduce it. Will look into it

bernhard-42 commented 1 year ago

Thank you for the great analysis and reproducing code! Really appreciate it.

You are right, it is the cache function. The cache key uses shape.HashCode(MAX_HASH_KEY) to create a hash for each object. unfortunately, this hash function has collisions (which I should have expected, given that HASH_MAX == 2147483647)

Now, ..., I know the reason, however, need to find a solution (that is fast, else caching doesn't make sense)

tdeyster commented 1 year ago

Thanks for the quick response! Hopefully you find a suitable solution. It's a great package you've built.

bernhard-42 commented 1 year ago

To document an idea:

When I read the pybind11 docs (the technology to bring C++ OCCT to Python), a "pybind11 based object" is a Python object that contains a reference to the C++ object. Hence, obj.HashCode(max_int) uses the C++ memory pointer for the hash and id(obj) uses the Python wrapper object memory address. Two different values in two different memory spaces.

id is guaranteed to be unique for objects that have an overlapping life span. We could still get the same id if an old object was deleted (or set to None), garbage collection has kicked in and a new Python wrapper was created at the same memory address for a different C++ OCCT object.

But if I combine both the Python wrapper object address id(obj) and the C++ memory pointer hash (obj.HashCode(max_key) in my cache key, then I don't think a collision can be created too easily: The newly created C++ object would need to be at the same memory address as the one before, and by chance the Python object would also need to be at the same Python memory address as the one before referencing the older C++ OCCT object.

The change would mean to change https://github.com/bernhard-42/jupyter-cadquery/blob/578b430af6e03ba05187e4fa22f64624b58a9cf5/jupyter_cadquery/tessellator.py#L46-L52 to

key = ( 
     tuple(( (s.HashCode(MAX_HASH_KEY), id(s) ) for s in shape)), 
    ...

Of course, a collision could still happen, however it should be very unlikely. In some initial tests with 10000s of objects I wasn't able to get a collision. And I still can add a switch to turn off caching, if an issue arises.

bernhard-42 / jupyter-cadquery