Jutho / TensorOperations.jl

Julia package for tensor contractions and related operations
https://jutho.github.io/TensorOperations.jl/stable/
Other
453 stars 56 forks source link

cache also uses taskid #88

Closed maartenvd closed 4 years ago

maartenvd commented 4 years ago

The caching mechanism appeared to fail with multiple tasks running on the same thread.

If you used tensoroperations in conjunction with strided and the new multithreading capabilities, a race condition was possible. Strided had a @threads for, which causes the scheduler sometimes to yield execution to another coroutine on the same thread (and therefore initially not actually executing the matrix multiplication). This coroutine would then acquire the same caches. Then, when the matrix multiplication actually got done in place, it would overwrite the result for both coroutines.

A minimal example demonstrating this behaviour independent of tensoroperations/strided is attached. minimal.zip

Jutho commented 4 years ago

Hi @maartenvd , I am currently implementing this change locally, together with a change to also store the structure of the temporary (i.e. its size and eltype for Array) as part of the lookup key, to avoid the problems we discussed.

Do you know if objectid(t::Task) is the official way to uniquely identify a task. Also, a task can probably only run on a single thread, so we probably only need the task identifier and not the threadid as part of the key?

Jutho commented 4 years ago

The thing that is being printed when it shows a task in the REPL is convert(UInt, pointer_from_objref(t)) so I guess that's definitely a good quantity to use.

maartenvd commented 4 years ago

I don't know the official way to uniquely identify tasks, on discourse indexing on current_task itself was recommended (though I have a hard time finding the relevant thread). I have attached a convoluted but working example with only arrays. tom.zip

maartenvd commented 4 years ago

there is also Base.task_local_storage

Jutho commented 4 years ago

Yes, I don’t fully understand what this does or is for

On 29 May 2020, at 17:13, maartenvd notifications@github.com<mailto:notifications@github.com> wrote:

there is also Base.task_local_storage

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Jutho/TensorOperations.jl/pull/88#issuecomment-636028096, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA7YENTWHFK6L3F3XUYDOOTRT7GH5ANCNFSM4NM3TAMA.

maartenvd commented 4 years ago

My minimal example is wrong again, I have a hard time reproducing it

Jutho commented 4 years ago

Did you have a change to test if all problems you were experiencing are gone with TensorOperations v3? Then I think this can be closed.

maartenvd commented 4 years ago

Yes, it's solved :)

Op vr 12 jun. 2020 om 16:55 schreef Jutho notifications@github.com:

Did you have a change to test if all problems you were experiencing are gone with TensorOperations v3? Then I think this can be closed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Jutho/TensorOperations.jl/pull/88#issuecomment-643316184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJKVCUGR4ZYTVSRTWYFMZ3RWI6VLANCNFSM4NM3TAMA .