Closed mvparakhin closed 1 year ago
Hi, and thanks for the suggestion! :+1:
If your solution is faster on average and compatible regardings its output, I'm fine with it not looking so purely functional. :wink:
The code you posted shows, that you're still using a quite old version of frugally-deep (tensor5
/shape5
). I've tried to modernize it, but when using this, the automated tests fail (wrong output shape compared to Keras). Would you like to open a pull request with your suggestion? This way, either you'll see the failing tests and can adjust the code accordingly, or I see what I did wrong when trying to integrate it.
Unfortunately, we have to use the last version that worked well with exported tf 1.x jason files :-(. I tried to quickly rebase separately, but still use our Windows build environment - but no luck, too many changes. ,I thought I'll just do another clone and build on the side using CMake - CMake throws it's usual bunch of meaningless errors (inside the FunctinalPlus and Eigen, can't find FunctionalPlusTargets.cmake...). I can't even create a separate branch - you seem to have it locked.
FWIW, I cannot see how the code below can possibly be incorrect. The only issue could be if you were using padded matrix representation, but you don't - and even then it would be a runtime error, shapes all should be correct... This version works great in our environment, about an order of magnitude faster for our usecases.
tensors apply_impl(const tensors& inputs) const override
{
const auto& input = single_tensor_from_tensors(inputs);
// According to the Keras documentation
// https://keras.io/layers/core/#dense
// "if the input to the layer has a rank greater than 2,
// then it is flattened prior to the initial dot product with kernel."
// But this seems to not be the case.
// Instead it does this: https://stackoverflow.com/a/43237727/1866775
// Otherwise the following would need to be done:
// if (input.shape().get_not_one_dimension_count() > 1)
// {
// input = flatten_tensor(input);
// }
const sizet depth = input.shape().depth;
assertion(depth == nin, "Invalid input value count.");
const auto& feature_arr = input.as_vector();
std::vector
Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> params(params_.data(), static_cast<EigenIndex>(params_.rows()-1), static_cast<EigenIndex>(params_.cols()));
Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> bias(params_.data()+(params_.rows()-1)*params_.cols(), static_cast<EigenIndex>(1), static_cast<EigenIndex>(params_.cols()));
for (size_t part_id=0; part_id<n_of_parts; ++part_id) {
Eigen::Map<const RowMajorMatrixXf, Eigen::Unaligned> m(&feature_arr[part_id*depth], static_cast<EigenIndex>(1), static_cast<EigenIndex>(depth));
Eigen::Map<RowMajorMatrixXf, Eigen::Unaligned> res_m(&result_values[part_id*n_out_], static_cast<EigenIndex>(1), static_cast<EigenIndex>(n_out_));
res_m.noalias() = m*params + bias;
}
return {tensor(change_tensor_shape_dimension_by_index(
input.shape(), 4, n_out_),
std::move(result_values))};
}
Oh, being stuck with an old TF version certainly is not pleasant. :grimacing:
Regarding the branch, yeah, you can't create one in this GitHub repository, but you can create a fork and do whatever you like there. :slightly_smiling_face:
I've just looked deeper into why the tests were failing with this implementation, and it was just because the output shape was always 5-dimensional. I've fixed that, and things work now. :+1:
Regarding performance, I can also confirm, that your implementation is faster. Maybe the following minimal example is not the ideal use case
inputs = [Input(shape=(17, 19, 23, 29, 31))]
outputs = [Dense(71, use_bias=True)(inputs[0])]
model = Model(inputs=inputs, outputs=outputs, name='test_model_exhaustive')
but with your version, the duration of a forward pass (on my machine) improved from 0.25 s to 0.13 s. :rocket:
Thanks a lot for this great suggestion. :heart:
Would you like to create a pull request from a fork of your own (to later have your contribution correctly attributed to you in the commit history of the main branch), or shall I just merge this pull request?
Can you please just merge yours?
Sent from my iPhone
On Jan 30, 2023, at 2:17 AM, Tobias Hermann @.***> wrote:
Thanks a lot for this great suggestion. ❤️
Would you like to create a pull request from a fork of your own (to later have your contribution correctly attributed to you in the commit history of the main branch), or shall I just merge this pull requesthttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDobiasd%2Ffrugally-deep%2Fpull%2F378%2Ffiles&data=05%7C01%7C%7Ccbdf2c80818849b5ffe908db02ab2384%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638106706271564614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jQl%2BV7nuZr7FLpjer5zznVCwBlrx7anVwNaNboMxHz0%3D&reserved=0?
— Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDobiasd%2Ffrugally-deep%2Fissues%2F376%23issuecomment-1408342946&data=05%7C01%7C%7Ccbdf2c80818849b5ffe908db02ab2384%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638106706271564614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1ux5IgJHDdg7whRRfZ7Y4KNwOmK3nSh6iHVPUICTdVc%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANM2KNMC4YVROR7MEC637ULWU6IKDANCNFSM6AAAAAAUDGVWRA&data=05%7C01%7C%7Ccbdf2c80818849b5ffe908db02ab2384%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638106706271564614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=F1lgNIgkMkIpXxbPOvH2pJYlGpkWsmSExzmjyZ2HNSo%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Tobias, I know you prefer to keep it purely functional, but there are cases where the current implementation can be arbitrarily slow. In particular, if you have a dense layer following the convolution one WITHOUT flattening, each position in convolution will cause several memory allocations and multiple data copies. Would you be open to replacing apply_impl there with the following optimized version (should be just as flexible):
tensor5s apply_impl(const tensor5s& inputs) const override { assertion(inputs.size() == 1, "invalid number of input tensors"); auto input = inputs.front(); // According to the Keras documentation // https://keras.io/layers/core/#dense // "if the input to the layer has a rank greater than 2, // then it is flattened prior to the initial dot product with kernel." // But this seems to not be the case. // Instead it does this: https://stackoverflow.com/a/43237727/1866775 // Otherwise the following would need to be done: // if (input.shape().get_not_one_dimension_count() > 1) // { // input = flatten_tensor5(input); // }
}