JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.34k stars 5.46k forks source link

objectid takes arbitrarily long #43542

Open cooijmanstim opened 2 years ago

cooijmanstim commented 2 years ago

In the following code, I construct a large deeply nested DAG and then call objectid on the root node. I'm expecting this to be instant, but instead the call to objectid takes time proportional to the size of the data structure. With the size 100 used below it probably won't return ever.

struct Foo
  x::Any
  y::Any
end
function test()
  foo = Foo(nothing, nothing)
  for i in 1:100
    foo = Foo(foo, foo)
  end
  println(1)
  println(objectid(foo))
  println(2)
end
test()

Version info:

julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, broadwell)
JeffBezanson commented 2 years ago

This is definitely a pathological case --- conceptually there is a tree of size 2^N, but because of shared references there are only N objects, so it feels like it should be possible to traverse. I guess we would need to add some memoization to fix this, but I'm not sure that's worthwhile since it will add overhead.

cooijmanstim commented 2 years ago

Pardon me, I was under the impression that objectid was like Python's id and so the fact that it was traversing the data structure looked like a bug to me. I understand now that pointer_from_objref is what I was looking for. That requires making the struct mutable; interestingly (to me) if I make the struct mutable then objectid no longer seems to have to traverse the data structure either.

vtjnash commented 2 years ago

I think it would be worthwhile to add a simple memoization dict. We probably have all of the needed pieces (with the ptrhash htable) to implement this fairly easily in the objectid code in jl_object_id in src/builtins.c, if anyone wanted to tackle it.

SamarthMayya commented 2 years ago

@vtjnash If this issue is still open, I would like to work on it please

vtjnash commented 2 years ago

Go for it

SamarthMayya commented 2 years ago

@vtjnash if I'm right, the memoization dict would be used in this case to store pointers to each foo object, so that we won't need to compute the same object multiple times. But then how does this help in improving the performance of the objectid method? How does the objectid method actually work? Where in the source code would I be able to understand it's working?

vtjnash commented 2 years ago

The computation happens in jl_object_id. The htable would need to contain the object as the key and the hash of the object as the value. There are a variety of ptrhash_bp calls already that you may be able to use to see how that table works.

SamarthMayya commented 2 years ago

@vtjnash, if I'm not wrong, the jl_object_id function returns whatever is returned by the jl_object_id_ function. In the jl_object_id_ function, there are conditions to check the type of the data, and it returns a precomputed hash value of the object. How and where is this hash value calculated? Also, if I make a change to a C file in the src directory, such as in builtin.h, is there a way to check if it works using the ccall method? What would I have to do in order to compile the C file again, so that the next time I run Julia REPL, I am able to call that function using ccall?

vtjnash commented 2 years ago

Yes, those are the functions you would need to change. Then rebuild (make) the src files, and call objectid again from Julia (make sure it did not change)