dolphinsmalltalk / Dolphin

Dolphin Smalltalk Core Image
MIT License
301 stars 58 forks source link

Deleted ExternalStructure subclasses reappear as zombies in external callbacks arguments #1191

Closed JBetz closed 10 months ago

JBetz commented 10 months ago

Describe the bug As far as I can tell, deleted definitions of external structure classes can get partially resurrected if a new class has been created with the same name. This results in weird behavior like different class definitions being used in different contexts. And in each case, the deleted class appears as a hollowed out version of the real class with an empty method dictionary.

I noticed this after uninstalling and reinstalling a package with many ExternalStructure definitions. New objects created with the reinstalled ExternalStructure classes were throwing #doesNotUnderstand: errors even though the method definitions were visible, both in the class browser and in the method dictionary of the class. Upon closer inspection, I noticed that #instanceSpec returned a different value for the ExternalStructure class when I ran it in a workspace vs. on an external structure passed as an argument to a MessageCallback.

To Reproduce Steps to reproduce the behavior:

  1. Open fresh image
  2. Create a subclass of ExternalStructure named ZombieStructure
  3. Add the following method to the ZombieStructure class:
    ZombieStructure class>>defineFields
    self defineField: #test type: LPVOIDField new
  4. Run ZombieStructure compileDefinition
  5. Delete the ZombieStructure class
  6. Run Metaclass allInstances select: [ :each | each instanceClass name = #ZombieStructure ]
  7. #(ZombieStructure class) is returned rather than an empty array

Expected behavior Deleted definitions of ExternalStructure classes should not be used for new instances.

Please complete the following information):

blairmcg commented 10 months ago

With regards to your repro, the class object is still referenced from the history list of the browser. If you close the browser, it will go away. Or at least it did when I tried it. If you you can run the following to see why it is still referenced:

ReferenceFinder findAllPathsTo: (Metaclass allInstances select: [ :each | each instanceClass name = #ZombieStructure ]) first 

It would be better if the history list stored a symbolic reference rather than a hard reference, but it isn't related to your issue.

However, I do remember encountering the problem, and your issue is almost certainly caused by one or more of the following:

  1. Pre-existing callback instances might still be in scope after you deleted the structure classes. These hold a reference to the class they were originally instantiated against. You can't expect that they will update to a new version of the class if you delete and recreate it (although they would if you changed the definition of the class). You'll need to make sure that uninstalling the package tears down any callback instances somehow, either from class uninitialize methods, or a package uninstall script as a last resort (since these are not visible to the tooling).
  2. You are "referencing" the structure classes from strings. This is quite likely because unfortunately the ExternalDescriptor class encourages this by providing a parser that takes a string as input. These references embedded in strings are not visible to the package system or other tooling, so it does not recognise them as a prerequisite dependency. As a result you can uninstall the package with the structures without the system requiring that you also uninstall the package with the methods that create or define the callbacks, leaving them in an invalid state (referring to obsolete classes). Also, these references will not be visible to the application deployment image stripper either, so there is a risk it will remove classes that are required for callbacks.
  3. Even more unfortunately, ExternalDescriptor in 7.1 (and earlier) has a cache of shared instances. This is a weak collection, but in the circumstance of (1) above, the old instances would prevent the cache entry referring to the old class going away. This would mean that when you ask for an ExternalDescriptor with the same string, the old definition that refers to the old class will get returned. It shouldn't happen if the code with the dependent callbacks is uninstalled with the structures, but as mentioned the design that encourages hidden class references in strings makes it easy to fall foul of this problem.

If you ensure that there are no callback instances existing that refer to the old class definition, and you re-execute whatever code parses the callback descriptor, then you should not see the problem. You can make it much less likely by ensuring that you construct any callback descriptor strings you parse using an actual reference to the class, as this will mean the referring code will have to be uninstalled at the same time you uninstall the structures, so the shared descriptor instances should go away. I would also recommend disabling the ExternalDescriptor shared instance cache. This can be done by setting the class variable ExternalDescriptor.Shared to nil.

An example of what not to do is in this DisplayMonitor class>>forCanvas:intersecting:do: method from Dolphin 7.1 https://github.com/dolphinsmalltalk/Dolphin/blob/48f8321cb370664717dc5f6657368f707df9c163/Core/Object%20Arts/Dolphin/MVP/Base/DisplayMonitor.cls#L194

In Dolphin 8 the hidden class reference is avoided like this https://github.com/dolphinsmalltalk/Dolphin/blob/ca05ce6fd96df3729f38e114caf4970204498f1a/Core/Object%20Arts/Dolphin/MVP/Base/UI.DisplayMonitor.cls#L197

This means that the RECTL class appears in the literal frame of the method, and so is visible to reference searches and package prerequisites tracing. The ##() expression makes it a constant to avoid the cost of rebuilding the string every time, but importantly the literal reference to RECTL is retained. Doing that (or making the descriptor itself a constant, or storing a shared instance in a class variable) with the class name in a string would just recreate the stale reference problem of ExternalDescriptor.Shared.

I'll send a a fix to remove the shared instance cache in 7.1. I removed it from D8 a couple of years ago now in 619d810404b136b83b327bc1e6064d6c8501cf30.

JBetz commented 10 months ago

If you you can run the following to see why it is still referenced:

ReferenceFinder findAllPathsTo: (Metaclass allInstances select: [ :each | each instanceClass name = #ZombieStructure ]) first

Very handy. I did try using #inspectReferences but got lost in the rabbit hole. This is much better.

However, I do remember encountering the problem, and your issue is almost certainly caused by one or more of the following:

I checked and it was indeed a combination of 1 and 3. Thanks!