cnuernber / dtype-next

A Clojure library designed to aid in the implementation of high performance algorithms and systems.
Other
328 stars 19 forks source link

Tensors can not be initialized when running the overview's code #32

Closed zhaoyul closed 3 years ago

zhaoyul commented 3 years ago

I am using [cnuernber/dtype-next "8.022"]

when running the following code:

(require '[tech.v3.tensor :as dtt])

(dtt/->tensor (partition 3 (range 9)))

got error:

Show: Project-Only All Hide: Clojure Java REPL Tooling Duplicates (14 frames hidden)

  1. Unhandled java.lang.ArrayIndexOutOfBoundsException Index 0 out of bounds for length 0

    global_to_local.clj: 33 tech.v3.tensor.dimensions.global-to-local/elem-idx->addr-fn global_to_local.clj: 23 tech.v3.tensor.dimensions.global-to-local/elem-idx->addr-fn global_to_local.clj: 121 tech.v3.tensor.dimensions.global-to-local/absent-sig-fn/reify ConcurrentHashMap.java: 1708 java.util.concurrent.ConcurrentHashMap/computeIfAbsent global_to_local.clj: 131 tech.v3.tensor.dimensions.global-to-local/make-indexing-obj global_to_local.clj: 124 tech.v3.tensor.dimensions.global-to-local/make-indexing-obj global_to_local.clj: 143 tech.v3.tensor.dimensions.global-to-local/get-or-create-reader global_to_local.clj: 138 tech.v3.tensor.dimensions.global-to-local/get-or-create-reader global_to_local.clj: 146 tech.v3.tensor.dimensions.global-to-local/get-or-create-reader global_to_local.clj: 138 tech.v3.tensor.dimensions.global-to-local/get-or-create-reader global_to_local.clj: 154 tech.v3.tensor.dimensions.global-to-local/dims->global->local-reader global_to_local.clj: 151 tech.v3.tensor.dimensions.global-to-local/dims->global->local-reader global_to_local.clj: 170 tech.v3.tensor.dimensions.global-to-local/dims->global->local global_to_local.clj: 162 tech.v3.tensor.dimensions.global-to-local/dims->global->local dimensions.clj: 306 tech.v3.tensor.dimensions/create-dimension-transforms/fn Delay.java: 42 clojure.lang.Delay/deref core.clj: 2320 clojure.core/deref core.clj: 2306 clojure.core/deref dimensions.clj: 271 tech.v3.tensor.dimensions/->global->local dimensions.clj: 271 tech.v3.tensor.dimensions/->global->local tensor.clj: 472 tech.v3.tensor/construct-tensor tensor.clj: 468 tech.v3.tensor/construct-tensor RestFn.java: 425 clojure.lang.RestFn/invoke tensor.clj: 556 tech.v3.tensor/->tensor tensor.clj: 536 tech.v3.tensor/->tensor RestFn.java: 410 clojure.lang.RestFn/invoke REPL: 193 user/eval16552 REPL: 193 user/eval16552 Compiler.java: 7177 clojure.lang.Compiler/eval Compiler.java: 7132 clojure.lang.Compiler/eval core.clj: 3214 clojure.core/eval core.clj: 3210 clojure.core/eval interruptible_eval.clj: 87 nrepl.middleware.interruptible-eval/evaluate/fn/fn AFn.java: 152 clojure.lang.AFn/applyToHelper AFn.java: 144 clojure.lang.AFn/applyTo core.clj: 665 clojure.core/apply core.clj: 1973 clojure.core/with-bindings core.clj: 1973 clojure.core/with-bindings RestFn.java: 425 clojure.lang.RestFn/invoke interruptible_eval.clj: 87 nrepl.middleware.interruptible-eval/evaluate/fn main.clj: 437 clojure.main/repl/read-eval-print/fn main.clj: 437 clojure.main/repl/read-eval-print main.clj: 458 clojure.main/repl/fn main.clj: 458 clojure.main/repl main.clj: 368 clojure.main/repl RestFn.java: 137 clojure.lang.RestFn/applyTo core.clj: 665 clojure.core/apply core.clj: 660 clojure.core/apply regrow.clj: 20 refactor-nrepl.ns.slam.hound.regrow/wrap-clojure-repl/fn RestFn.java: 1523 clojure.lang.RestFn/invoke interruptible_eval.clj: 84 nrepl.middleware.interruptible-eval/evaluate interruptible_eval.clj: 56 nrepl.middleware.interruptible-eval/evaluate interruptible_eval.clj: 152 nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn AFn.java: 22 clojure.lang.AFn/run session.clj: 202 nrepl.middleware.session/session-exec/main-loop/fn session.clj: 201 nrepl.middleware.session/session-exec/main-loop AFn.java: 22 clojure.lang.AFn/run Thread.java: 832 java.lang.Thread/run

cnuernber commented 3 years ago

Thanks for the issue - I will look into this.

cnuernber commented 3 years ago

I tried this from the dtype-next project itself and I created a testapp project with a deps file with a single dep which points to the dtype-next 8.022. The results are the same in all cases:

user> (require '[tech.v3.tensor :as dtt])
nil
user> (dtt/->tensor (partition 3 (range 9)))
#tech.v3.tensor<object>[3 3]
[[0 1 2]
 [3 4 5]
 [6 7 8]]

Can you give me a bit more information about your setup? What java version are you running and what IDE are you using?

zhaoyul commented 3 years ago

Thanks for the quick reply^_^

It works on my old macbook(intel CPU), but not on M1 Chip, is it hardware releated?

hardware & os

Apple M1, macbook air, macos 11.5 (20G71)

Java

➜ ~ java --version openjdk 15.0.1 2020-10-20 OpenJDK Runtime Environment Zulu15.28+1013-CA (build 15.0.1+9) OpenJDK 64-Bit Server VM Zulu15.28+1013-CA (build 15.0.1+9, mixed mode)

IDE

emacs 27.2 + Cider 1.1.0

cnuernber commented 3 years ago

I think it is JDK related. I use either oracle, graal, or pure openjdk jdk's. I have not tested on zulu so my guess is that is one part of the issue.

cnuernber commented 3 years ago

Running azul jdk I still get no error:

user> (require '[tech.v3.tensor :as dtt])
nil
user> (System/getProperty "java.vendor")
"Azul Systems, Inc."
user> (System/getProperty "java.version")
"15.0.4"
user> (dtt/->tensor (partition 3 (range 9)))
#tech.v3.tensor<object>[3 3]
[[0 1 2]
 [3 4 5]
 [6 7 8]]

That being said, from your stack trace I see that the insn code generation pathway is failing and thus it is falling back to the more general idx->elemaddr pathway. This in itself should be fine but what is confusing is that for a simple dense tensor the elem-addr->idx fn is Y * n-cols + x which doesn't require any array lookups.

cnuernber commented 3 years ago

I added a ton more logging in the case of failure to 04c059224. When you have a moment it would be great if you could install this version of dtype-next locally and run just that testcase again, then paste everything you can find to this issue ;-).

I don't have access to an m-1 mac at the moment so I can't run your exact testcase - the issue lies in the fact that on my system shape/strides for that case each have length of 1 and on your system one of the two of them is empty which is itself nonsensical. If the dimensions themselves are valid then their reduction (tech.v3.tensor.dimensions.analysis/reduce-dimensions) is producing a nonsensical answer, the answer should be:

user> (require '[tech.v3.tensor.dimensions.analytics :as dims-analytics])
nil
user> (def tens (dtt/->tensor (partition 3 (range 9))))
#'user/tens
user> (dims-analytics/reduce-dimensionality (.dimensions tens))
{:shape [9],
 :strides [1],
 :offsets [0],
 :shape-ecounts [9],
 :shape-ecount-strides [1]}

Here is what I think is a very minimal repro:


user> (require '[tech.v3.tensor.dimensions :as dims])
10:33:58.744 [nREPL-session-84612425-8362-4de0-93ee-1c9fb6af4c71] DEBUG tech.v3.datatype.functional - JDK16 vector ops are not available: Syntax error compiling at (tech/v3/datatype/functional/vecopt.clj:1:1).
10:33:59.061 [nREPL-session-84612425-8362-4de0-93ee-1c9fb6af4c71] DEBUG t.v.t.dimensions.global-to-local - insn custom indexing enabled!
nil
user> (require '[tech.v3.tensor.dimensions.analytics :as dims-analytics])
nil
user> (-> (dims/dimensions [3 3])
          (dims-analytics/reduce-dimensionality))
{:shape [9],
 :strides [1],
 :offsets [0],
 :shape-ecounts [9],
 :shape-ecount-strides [1]}
user> (require '[tech.v3.tensor.dimensions.global-to-local :as gtol])
nil
user> (-> (dims/dimensions [3 3])
          (dims-analytics/reduce-dimensionality)
          (gtol/elem-idx->addr-fn))
[0 1 2 3 4 5 6 7 8]
zhaoyul commented 3 years ago

Faild to reproduce on 04c0592, Any ideas?

➜  dtype-next git:(04c0592) lein uberjar
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
Compiling 58 source files to /Users/kevinli/sandbox/tmp/dtype-next/target/classes
/Users/kevinli/sandbox/tmp/dtype-next/java/tech/v3/datatype/UnsafeUtil.java:6: warning: Unsafe is internal proprietary API and may be removed in a future release
import sun.misc.Unsafe;
               ^
/Users/kevinli/sandbox/tmp/dtype-next/java/tech/v3/datatype/UnsafeUtil.java:13: warning: Unsafe is internal proprietary API and may be removed in a future release
  public static Unsafe getUnsafe() {
                ^
/Users/kevinli/sandbox/tmp/dtype-next/java/tech/v3/datatype/UnsafeUtil.java:15: warning: Unsafe is internal proprietary API and may be removed in a future release
      Field f = Unsafe.class.getDeclaredField("theUnsafe");
                ^
/Users/kevinli/sandbox/tmp/dtype-next/java/tech/v3/datatype/UnsafeUtil.java:17: warning: Unsafe is internal proprietary API and may be removed in a future release
      return Unsafe.class.cast(f.get(null));
             ^
/Users/kevinli/sandbox/tmp/dtype-next/java/tech/v3/datatype/UnsafeUtil.java:27: warning: Unsafe is internal proprietary API and may be removed in a future release
  public static Unsafe unsafe = getUnsafe();
                ^
Note: /Users/kevinli/sandbox/tmp/dtype-next/java/tech/v3/datatype/Convolve1D.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
5 warnings
Compiling tech.v3.datatype.main
Compiling tech.v3.datatype.expose-fn
Created /Users/kevinli/sandbox/tmp/dtype-next/target/dtype-next-8.023-SNAPSHOT.jar
Created /Users/kevinli/sandbox/tmp/dtype-next/target/dtype-next.jar

Use the Jar by adding the following line in project.clj

  :resource-paths  ["dtype-next.jar"]

and the test passed...

user> (require '[tech.v3.tensor :as dtt])
nil
user> (System/getProperty "java.vendor")
"Azul Systems, Inc."
user> (System/getProperty "java.version")
"15.0.1"
user> (dtt/->tensor (partition 3 (range 9)))
#tech.v3.tensor<object>[3 3]
[[0 1 2]
 [3 4 5]
 [6 7 8]]
user> (require '[tech.v3.tensor.dimensions.analytics :as dims-analytics])
nil
user> (def tens (dtt/->tensor (partition 3 (range 9))))
#'user/tens
user> (dims-analytics/reduce-dimensionality (.dimensions tens))
{:shape [9],
 :strides [1],
 :offsets [0],
 :shape-ecounts [9],
 :shape-ecount-strides [1]}
user> (require '[tech.v3.tensor.dimensions :as dims])
nil
user> (require '[tech.v3.tensor.dimensions.analytics :as dims-analytics])
nil
user> (-> (dims/dimensions [3 3])
          (dims-analytics/reduce-dimensionality))
{:shape [9],
 :strides [1],
 :offsets [0],
 :shape-ecounts [9],
 :shape-ecount-strides [1]}
user> (require '[tech.v3.tensor.dimensions.global-to-local :as gtol])
nil
user> (-> (dims/dimensions [3 3])
          (dims-analytics/reduce-dimensionality)
          (gtol/elem-idx->addr-fn))
[0 1 2 3 4 5 6 7 8]
user> 
cnuernber commented 3 years ago

I do have an idea - could you tell me more about your uberjar pathway?

I would have assumed you would do lein install and not lein uberjar but regardless this is what may be happening.

My assumption is that somewhere in your pathway you are using aot. This is leaving .class files around in the classpath and when Clojure itself is loading some of the protocols and types created by the dtype-next system there are multiple classes that correspond to one logical type. This is making the protocol lookup pathway fail and thus some of the protocol functions are returning incorrect information when various types are passed in.

I get this myself when I use aot and compile to uberjar in a project and then try to run from the repl. Sometimes the classloader loads the .clj files and sometimes it loads classes from the uberjar. lein clean nearly always resolves it until the next time I create an uberjar.

zhaoyul commented 3 years ago

You naild it Chris!

Huge thanks for creating this ameazing libary for clojure community.

It helped me a lot by watching your talk and reading the code.

Hope dtype-next were more thoughlty documented, any chance I can help?

cnuernber commented 3 years ago

Kevin, lots of chances for you to help :-)!

The people most actively working on working with and documenting the Clojure data science and numerics systems are scicloj - @daslu - What is the current best pathway for new people to get involved with scicloj?

As far as dtype-next - what part about the architecture interests you the most? What would you like to walk through or work with first?

daslu commented 3 years ago

@zhaoyul @cnuernber Sounds wonderful!

@zhaoyul I'd love to see how I can help with an easy pathway to getting involved. Would it be good to set a short call (or text chat if you prefer so) and discuss it? You can reach me as daslu at Reddit/Clojureverse/Slack and as Daniel Slutsky at Zulip. The latter is the best for these topics, probably. https://www.clojurians-zulip.org/

(I think that is our current "best pathway".)

cnuernber commented 3 years ago

This issue seems resolved in a great way :-)