Loading Caffe models - Githubissues

IDSIA / brainstorm

Fast, flexible and fun neural networks.

Other

1.3k stars 152 forks source link

Loading Caffe models #52

Open flukeskywalker opened 8 years ago

flukeskywalker commented 8 years ago

Since some papers have made available pre-trained Caffe convnets, it'd be nice to be able to use them in Brainstorm.

flukeskywalker commented 8 years ago

This requires conversion from models for NCHW format (Caffe) to those for NHWC (Brainstorm), so it's not straightforward, but should still be possible.

pranv commented 8 years ago

I have some experience with this - I started with this same goal for Keras that took many turns resulting in things like the Graph model, but this hasn't been merged yet due to an issue. I'll try this over the weekend along with the keras part. As you've said, it's slightly tricky. I've now understood that you need to rotate the Kernels 90 degrees TWICE.

Meanwhile, if I could hijack this issue, is there any design document that explains some of the design choices you made? Just to get a better understanding of your goals.

flukeskywalker commented 8 years ago

Cool, looking forward to it! NHWC layout makes things like this a bit trickier, but we think it's the better format for the long run. Plus, cuDNN v4 will fully support it soon :)

We will indeed provide details about the design choices in brainstorm soon (beginning next week). If you get curious in the meantime, you may ask us questions on the mailing list.

pranv commented 8 years ago

I could complete code for keras/theano conversion [PR: #921 on keras repo]. Most of it can be reused here, though I would like to know what would be best. The code I wrote there is really generic, takes in any Caffe Network and converts it to a equivalent DAG and then loads the weights. There is a lot of tiny things that are taken care of, for this to happen, making the process really complicated and cumbersome to follow. Unlike my approach, Chainer devs decided to just support available BVLC models (it could work for OxfordNet though), which are are simple sequential models. This reduces the size of code by half, and makes it a lot more easier and quicker for someone who wants to understand what is going on.

What would you guys prefer to have?

flukeskywalker commented 8 years ago

Does Keras also use NHWC?

We'd like to have a more general approach (full DAG). It's fine to start with handling simpler cases, with extensibility in mind.

Brainstorm also works with DAGs. The difference in connecting layers (compared to Caffe) is that every layer in Brainstorm uses inputs and outputs with fixed names (except the Input layer).

Side note: We are working on explaining the design in the docs branch. See the section Internals.

pranv commented 8 years ago

Theano uses NCHW (bc01) layout. np.swapaxes() should help in conversion I think :)

Thanks for the docs, things are making more sense now..