BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.12k stars 18.69k forks source link

Socket Data Layer #238

Closed sergeyk closed 7 years ago

sergeyk commented 10 years ago

Data preparation is orders of magnitude easier with high-level languages such as Python or R than with C++. HDF5DataLayer is a welcome interface between data preparation scripts and Caffe, but we can go even further. I propose a DataSource (see #148) that reads input on a socket. A separate process -- or processes -- which can be Python or R scripts, will be responsible for preparing data to send to the socket.

I haven't coded sockets in C++ before. The main questions are:

Yangqing commented 10 years ago

Protobuf works nicely with Python, so it might be a good choice. Not sure about Matlab though.

If you want to use socket communications, boost::asio seems to be a good choice.

Yangqing

On Tue, Mar 18, 2014 at 4:24 PM, Sergey Karayev notifications@github.comwrote:

Data preparation is orders of magnitude easier with high-level languages such as Python or R than with C++. HDF5DataLayer is a welcome interface between data preparation scripts and Caffe, but we can go even further. I propose a DataSource (see #148 https://github.com/BVLC/caffe/issues/148) that reads input on a socket. A separate process -- or processes --, which can be Python or R scripts, will be responsible for preparing data to send to the socket.

I haven't coded sockets in C++ before. The main questions are:

  • Is there a good library for our type of IPC (float/double blobs)?
  • If there isn't, then what's the best off-the-shelf solution? Protobuf? Capn proto? YAML Binary?

Reply to this email directly or view it on GitHubhttps://github.com/BVLC/caffe/issues/238 .

bhack commented 10 years ago

What do you think of zeromq + google protobuff? Or websocket if we want to interface with web.

sguada commented 10 years ago

I would advocate for YAML binary way easier to maintain and works with python, Matlab, R, web, ... Protobuf are slower in Python and painfully slow I'm Matlab

On Monday, September 22, 2014, bhack notifications@github.com wrote:

What do you think of zeromq + google protobuff https://github.com/joshrotenberg/zmqpbexample? Or websocket http://www.zaphoyd.com/websocketpp if we want to interface with web.

— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/238#issuecomment-56393851.

Sergio

shelhamer commented 10 years ago

Agreed, YAML seems like the first contender for a prototype socket layer.

bhack commented 10 years ago

Some serialization example in python with zeromq

bhack commented 10 years ago

Is YAML binary encoded base64? In this case i don't think that it will be so efficient if we want to enable also network support.

bhack commented 10 years ago

Somebody has opened the course of Caffe in the network (not the neural one :smile:) with https://github.com/BVLC/caffe/pull/1140. I don't know if raw socket is the best solution but probably we could coordinate network feeding from the network in that work.

bhack commented 10 years ago

I think also that transparent multitransport support could be interesting if performance overhead is low

shelhamer commented 9 years ago

I believe @cypof has an in-progress layer that fits this purpose.

iconica commented 9 years ago

Yes. @cypof and I have a socket-layer and some additional data-adapters that are almost complete. We'll be doing further internal tests at Flickr this week, and are hoping to have things finished from our side and PR'ed soon. (If not this week, it'll likely get pushed until after GTC though.)

bhack commented 8 years ago

@cypof Do you have some news on this?

cypof commented 8 years ago

@bhack this was replaced by integrating with Spark instead.

On Sun, Jan 31, 2016, 3:19 AM bhack notifications@github.com wrote:

@cypof https://github.com/cypof Do you have some news on this?

— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/238#issuecomment-177470629.

shelhamer commented 7 years ago

Closing as this has been addressed by CaffeOnSpark among other projects, and most simply can be handled by a Python data layer + zeromq or the like.