Closed lineagech closed 1 year ago
I think that case you mentioned should be ok as long as the thread with the smaller pos will eventually make its cq->head_mark value UNLOCKED. If that thread is never reaching there then thats a different problem.
I think that case you mentioned should be ok as long as the thread with the smaller pos will eventually make its cq->head_mark value UNLOCKED. If that thread is never reaching there then thats a different problem.
Hmm... I am suspecting the threads with smaller pos is never reaching from the trace I got. Has had a solution to this? One workaround I can think of is that making the thread getting the cq->head_lock processes the previous entries (smaller pos). But not sure if this would make the whole system slow down.
I am not sure thats a good idea. If the thread with smaller pos is never reaching then that is the real problem. can you share what cuda toolkit version you are using, the program and parameters you are using that is causing the issue, and any trace to point out the exact problem?
I believe this is no longer an issue based on last few conversations we had. Closing this.
I ran into an issue: threads enqueued cmds, and then the threads with larger pos (the second parameter of cq_dequeue) entered cq_dequeue and called move_head_cq earlier, which could not make cq->head_mark UNLOCKED because threads with smaller pos would not make cq->head_mark LOCKED (not being scheduled). The return value of move_head_cq (head_move_count) was always 0. Is it a known issue? Thank you!