Dobiasd / frugally-deep

A lightweight header-only library for using Keras (TensorFlow) models in C++.
MIT License
1.07k stars 237 forks source link

Suggestion for dense_layer.hpp #376

Closed mvparakhin closed 1 year ago

mvparakhin commented 1 year ago

Tobias, I know you prefer to keep it purely functional, but there are cases where the current implementation can be arbitrarily slow. In particular, if you have a dense layer following the convolution one WITHOUT flattening, each position in convolution will cause several memory allocations and multiple data copies. Would you be open to replacing apply_impl there with the following optimized version (should be just as flexible):

tensor5s apply_impl(const tensor5s& inputs) const override { assertion(inputs.size() == 1, "invalid number of input tensors"); auto input = inputs.front(); // According to the Keras documentation // https://keras.io/layers/core/#dense // "if the input to the layer has a rank greater than 2, // then it is flattened prior to the initial dot product with kernel." // But this seems to not be the case. // Instead it does this: https://stackoverflow.com/a/43237727/1866775 // Otherwise the following would need to be done: // if (input.shape().get_not_one_dimension_count() > 1) // { // input = flatten_tensor5(input); // }

    const auto& feature_arr = *input.as_vector();
    const size_t size = feature_arr.size();
    const size_t depth = input.shape().depth_;
    assertion(depth == n_in_ && (size%depth)==0, "Invalid input value count.");
    std::vector<float> result_values((input.shape().volume()/depth)*n_out_);
    const size_t n_of_parts = size/depth;

    Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> params(params_.data(), static_cast<EigenIndex>(params_.rows()-1), static_cast<EigenIndex>(params_.cols()));
    Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> bias(params_.data()+(params_.rows()-1)*params_.cols(), static_cast<EigenIndex>(1), static_cast<EigenIndex>(params_.cols()));
    for (size_t part_id=0; part_id<n_of_parts; ++part_id) {
       Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> m(&feature_arr[part_id*depth], static_cast<EigenIndex>(1), static_cast<EigenIndex>(depth));
       Eigen::Map<RowMajorMatrixXf, Eigen::Unaligned> res_m(&result_values[part_id*n_out_], static_cast<EigenIndex>(1), static_cast<EigenIndex>(n_out_));
       res_m.noalias() = m*params + bias;
    }
    return {tensor5(shape5(
       input.shape().size_dim_5_,
       input.shape().size_dim_4_,
       input.shape().height_,
       input.shape().width_,
       n_out_),
       std::move(result_values))};

}

Dobiasd commented 1 year ago

Hi, and thanks for the suggestion! :+1:

If your solution is faster on average and compatible regardings its output, I'm fine with it not looking so purely functional. :wink:

The code you posted shows, that you're still using a quite old version of frugally-deep (tensor5/shape5). I've tried to modernize it, but when using this, the automated tests fail (wrong output shape compared to Keras). Would you like to open a pull request with your suggestion? This way, either you'll see the failing tests and can adjust the code accordingly, or I see what I did wrong when trying to integrate it.

mvparakhin commented 1 year ago

Unfortunately, we have to use the last version that worked well with exported tf 1.x jason files :-(. I tried to quickly rebase separately, but still use our Windows build environment - but no luck, too many changes. ,I thought I'll just do another clone and build on the side using CMake - CMake throws it's usual bunch of meaningless errors (inside the FunctinalPlus and Eigen, can't find FunctionalPlusTargets.cmake...). I can't even create a separate branch - you seem to have it locked.

FWIW, I cannot see how the code below can possibly be incorrect. The only issue could be if you were using padded matrix representation, but you don't - and even then it would be a runtime error, shapes all should be correct... This version works great in our environment, about an order of magnitude faster for our usecases.

tensors apply_impl(const tensors& inputs) const override { const auto& input = single_tensor_from_tensors(inputs); // According to the Keras documentation // https://keras.io/layers/core/#dense // "if the input to the layer has a rank greater than 2, // then it is flattened prior to the initial dot product with kernel." // But this seems to not be the case. // Instead it does this: https://stackoverflow.com/a/43237727/1866775 // Otherwise the following would need to be done: // if (input.shape().get_not_one_dimension_count() > 1) // { // input = flatten_tensor(input); // } const sizet depth = input.shape().depth; assertion(depth == nin, "Invalid input value count."); const auto& feature_arr = input.as_vector(); std::vector result_values((input.shape().volume()/depth)nout); const size_t size = feature_arr.size(); const size_t n_of_parts = size/depth;

    Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> params(params_.data(), static_cast<EigenIndex>(params_.rows()-1), static_cast<EigenIndex>(params_.cols()));
    Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> bias(params_.data()+(params_.rows()-1)*params_.cols(), static_cast<EigenIndex>(1), static_cast<EigenIndex>(params_.cols()));
    for (size_t part_id=0; part_id<n_of_parts; ++part_id) {
       Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> m(&feature_arr[part_id*depth], static_cast<EigenIndex>(1), static_cast<EigenIndex>(depth));
       Eigen::Map<RowMajorMatrixXf, Eigen::Unaligned> res_m(&result_values[part_id*n_out_], static_cast<EigenIndex>(1), static_cast<EigenIndex>(n_out_));
       res_m.noalias() = m*params + bias;
    }
    return {tensor(change_tensor_shape_dimension_by_index(
            input.shape(), 4, n_out_),
            std::move(result_values))};
}
Dobiasd commented 1 year ago

Oh, being stuck with an old TF version certainly is not pleasant. :grimacing:

Regarding the branch, yeah, you can't create one in this GitHub repository, but you can create a fork and do whatever you like there. :slightly_smiling_face:

I've just looked deeper into why the tests were failing with this implementation, and it was just because the output shape was always 5-dimensional. I've fixed that, and things work now. :+1:

Regarding performance, I can also confirm, that your implementation is faster. Maybe the following minimal example is not the ideal use case

inputs = [Input(shape=(17, 19, 23, 29, 31))]
outputs = [Dense(71, use_bias=True)(inputs[0])]
model = Model(inputs=inputs, outputs=outputs, name='test_model_exhaustive')

but with your version, the duration of a forward pass (on my machine) improved from 0.25 s to 0.13 s. :rocket:

Dobiasd commented 1 year ago

Thanks a lot for this great suggestion. :heart:

Would you like to create a pull request from a fork of your own (to later have your contribution correctly attributed to you in the commit history of the main branch), or shall I just merge this pull request?

mvparakhin commented 1 year ago

Can you please just merge yours?

Sent from my iPhone

On Jan 30, 2023, at 2:17 AM, Tobias Hermann @.***> wrote:



Thanks a lot for this great suggestion. ❤️

Would you like to create a pull request from a fork of your own (to later have your contribution correctly attributed to you in the commit history of the main branch), or shall I just merge this pull requesthttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDobiasd%2Ffrugally-deep%2Fpull%2F378%2Ffiles&data=05%7C01%7C%7Ccbdf2c80818849b5ffe908db02ab2384%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638106706271564614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jQl%2BV7nuZr7FLpjer5zznVCwBlrx7anVwNaNboMxHz0%3D&reserved=0?

— Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDobiasd%2Ffrugally-deep%2Fissues%2F376%23issuecomment-1408342946&data=05%7C01%7C%7Ccbdf2c80818849b5ffe908db02ab2384%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638106706271564614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1ux5IgJHDdg7whRRfZ7Y4KNwOmK3nSh6iHVPUICTdVc%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANM2KNMC4YVROR7MEC637ULWU6IKDANCNFSM6AAAAAAUDGVWRA&data=05%7C01%7C%7Ccbdf2c80818849b5ffe908db02ab2384%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638106706271564614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=F1lgNIgkMkIpXxbPOvH2pJYlGpkWsmSExzmjyZ2HNSo%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>