sturfee-petrl commented 7 years ago

TLDR

Is it possible use GRE for OpenGL+Cuda scaling?
Should we use Vulkan instead of OpenGL if we want to use multiply GPUs parallel and handle result in gpu memory of the same device by cuda or some cnn as caffe?

Summary

We are looking for a vertical scaling solution our OpenGL+Cuda. Recently I used GRE for our Caffe. It works great! Thanks!

Now my team require me to bring GRE to our OpenGL+Cuda server. But I know that it will not work.

If you initialize an OpenGL instance in one thread and then call it from a different thread it will not work. OpenGL work only in the same thread.

I used OpenGL+Cuda with golang long time before I knew about GRE. I had a big problem with an asynchronization. I solved it via this pattern.

type Engine struct {
    sync.Mutex
    inited  bool
    render  chan position
    result  chan *image.Image
    destroy chan int
}

func (e *Engine)Init()  {
    go func() {
        // lock openGL to fixed thread 
        runtime.LockOSThread()
        defer runtime.UnlockOSThread()
        ... // init opengGL inside
        // 
        var p Task
        for {  
            select {
            case p <- e.render:
                // ... rendering 
                e.result <- result
            case <- e.destroy:
                return
            }
        }
    }()
}

func Render(e *Engine, task Task) (img *image.Image) {
    e.Lock()
    e.render <- task
    img <- e.result
    e.Unlock()
    return
}

It worked fine from go but later we moved our OpenGL module to CPP.


Engine::Engine(Provider provider) {
    // start infinite background loop for OpenGL
    std::thread loop(&Engine::Launch, this, provider);
    loop.detach();
}

void Engine::Launch(Provider provider) {
    // lock must be locked for condition_variable
    this->main_mutex.lock();
    Item *i;
    // OpenGL initialization
    OpenGLRender OpenGL_render;
    // infinite loop
    while (true) {
        // if queue is empty the condition_variable set a thread to sleep
        while (chanel.empty()) {
            this->cv.wait(main_mutex);
        }
        // lock queue for getting request
        this->queue_mutex;
        // get next request from the queue
        i = chanel.front();
        // pop the request from queue
        chanel.pop();
        // unlock queue
        this->queue_mutex;
        // loop mustn't crash!
        try {
            // estimate score
            calc(i, &task);
        } catch (...) {
            // pass all exceptions for now
        }
    }
}

void Engine::push(Item *i) {
    // lock queue for putting request
    std::lock_guard <std::mutex> lck(this->queue_mutex);
    // put a request in the queue
    chanel.push(i);
    // notify condition_variable that queue has the request
    this->cv.notify_one();
}

I want to keep my implementation which allow to scale OpenGL. But I can't proof to my team that it's not possible to use the GRE approach for OpenGL+Cuda.

Thanks a lot! Don't hesitate to ask me details

flx42 commented 6 years ago

Sorry for the very long latency for the reply :) This code is given as a demo for inference, since there has been a gap on this aspect for a long time. You probably don't care about the answer now, but a separate C++ thread pool is probably the solution if anyone else ends up reading this issue :)

sturfee-petrl commented 6 years ago

@flx42 It's still more then actual for me :simple_smile: Thank you for your opinion!

flx42 commented 6 years ago

ah well :) In this case, I would either not use Go to avoid having tons of thread being created (pure C++ or maybe Rust?).

Or a separate C++ thread pool that holds the OpenGL contexts.

NVIDIA / gpu-rest-engine

Use GRE for OpenGL #16

TLDR

Summary

It worked fine from go but later we moved our OpenGL module to CPP.