Add positional_encodings_ layer to Dlib

Cydral commented 1 month ago

This pull request introduces a new layer, positionalencodings, to the Dlib library. The positionalencodings layer adds "positional encodings" to the input tensor, which is particularly useful for models processing sequential data, such as transformers. The positional encodings are computed using sine and cosine functions of different frequencies, as described in the paper "Attention is All You Need" by Vaswani et al. This enhancement aims to provide positional information to the model, improving its ability to understand the order and position of elements in a sequence.

The implementation includes methods for setup, forward propagation, and backward propagation, along with serialization support.

pfeatherstone commented 1 month ago

@Cydral This is all great. Really cool. Can I ask though, why do you use dlib instead of Pytorch for neural nets? Slowly adding transformer support to dlib is a lot of work due to the absence of autograd. Its recursive template API makes it very slow to compile. In torch, adding a SOTA layer can be just a few lines of code. For example, sinusoidal positional embeddings is like 15 lines. You can then train in pytorch, then export, or even compile to something you can run in C/C++. So I'm not sure what's the benefit of adding transformer support to dlib when there are arguably better solutions for DNNs in C++. Also, the author has the burden of maintaining this code. I'm not trying to overly criticise but this is requiring a lot of work for potentially little gain. I'm wondering if these additions belong in a "dlib_contrib" repository.

Cydral commented 1 month ago

@Cydral This is all great. Really cool. Can I ask though, why do you use dlib instead of Pytorch for neural nets? Slowly adding transformer support to dlib is a lot of work due to the absence of autograd. Its recursive template API makes it very slow to compile. In torch, adding a SOTA layer can be just a few lines of code. For example, sinusoidal positional embeddings is like 15 lines. You can then train in pytorch, then export, or even compile to something you can run in C/C++. So I'm not sure what's the benefit of adding transformer support to dlib when there are arguably better solutions for DNNs in C++. Also, the author has the burden of maintaining this code. I'm not trying to overly criticise but this is requiring a lot of work for potentially little gain. I'm wondering if these additions belong in a "dlib_contrib" repository.

It's obviously an interesting question and yes, I can confirm that it's also a huge job to add such layers to Dlib, but the idea here isn't necessarily to build a neural network that's exactly the same as what you'll find in other libraries. For example, I'm studying in parallel the impact of convolution layers to replace certain ‘linear’ type layers that you'll actually find in PyTorch. Of course, the absence of an autograd function also means that we have to design derivatives specifically for the gradient, but it's also a way of getting a better handle on what we're doing. So far I've managed to achieve some interesting results (we'd been trying for a while to introduce a mechanism for predicting the ‘next’ sequence in Dlib using a feed forward principle). In particular, I've managed to get the network to memorise sequences of text, which I think is already a good start, and even if I decide not to go to the end of this "exercise", the new layers I've added could also be used for specific processing on images too. In this respect, I think we have a lot of great things to do with Dlib. This isn't the first time I've read about developers moving on to PyTorch, but if nobody tries to develop Dlib any more, then let's shut down development of this library... even though it's still very efficient for many tasks. Adding new functions does not, in principle, oblige the developer to maintain it all. It's also the role of the community around this library to help fix any functional problems it sees (as I sometimes do for certain parts contributed by other contributors).

arrufat commented 1 month ago

Yes, I want to encourage you to keep up with this great work! I can't wait to try training transformer-based networks in dlib :)

And of course, I will try my best to help to maintain this stuff :D

Cydral commented 1 month ago

I'm continuing and making progress... I've just finished reworking gemm() to take into account the matricial dimension of 4D tensors.

davisking commented 1 month ago

Yeah it's all good. I am a huge fan of pytorch, but there are also things about it I would do differently. Frankly, if I wasn't extremely busy working at a pre-money startup and trying to ensure we are successful long term (which I feel pretty good about our future), I would be working way more on dlib and making more open source ML stuff in particular.

There is also way more going on in dlib than the deep learning stuff. The deep learning tooling is honestly the least interesting part of dlib for me. There are some pretty sweeping changes I would make to the deep learning stuff and will at some point. Anyway, that's all to say, it doesn't matter how many people use the deep learning parts of dlib. There are tons of other things in it that lots of people use on a huge number of projects too.

And the dnn tooling is very low dependency and easy to compile, which is really it's selling point. And people use that still and that's fine.

That's all to say, this PR is cool. So knock yourself out :)

Cydral commented 1 month ago

There are no more conflicts for this modification. However, the precompilation tests fail because the tril_ class (previously committed in the master branch) is not found in the dnn.cpp program... if you merge the modification, it should still pass, shouldn't it?

arrufat commented 1 month ago

You should be able to merge the master branch into this one and get the tril_ layer here too, so the tests pass. Something seems off, where you have the tests of that layer, but not the implementation...

Cydral commented 1 month ago

Something went wrong when I retrieved the latest versions after the integration of the tril layer... I just deleted the accesses to tril in the positional_encodings branch, in the dnn.ccp file. I hope that when the merge is done, nothing will disappear in the master branch.

Cydral commented 1 month ago

@davis, There were indeed new conflicts following the integration of the last class that you dealt with. It looks good now, but could you please go through the integration in chronological order and thus consider frist this new class instead. If it's OK and integrated, I'll make the following changes to avoid going back and forth and I'll keep you informed. Thank you in advance for your help and support.

davisking commented 3 weeks ago

Ah you have a merge error or something. Check the PR contents, it's missing all your code changes :(

Cydral commented 3 weeks ago

Sorry for all these troubles, I've never had so many problems with merges on GitHub... I've just merged with the main branch and in my session I can see all the classes added recently (transpose, embeddings, tril, ..., and positional_encodings). Could you please have another look from your side? I can't see what code is missing now.

davisking commented 2 weeks ago

Sorry for all these troubles, I've never had so many problems with merges on GitHub... I've just merged with the main branch and in my session I can see all the classes added recently (transpose, embeddings, tril, ..., and positional_encodings). Could you please have another look from your side? I can't see what code is missing now.

No worries. Check it again though. Like look at https://github.com/davisking/dlib/pull/3019/files, it's still missing the changes :shrug:

Cydral commented 2 weeks ago

It's OK now. The implementation in was in fact present but came from a previous merge (which explains why the tests and precompilation worked already)...

davisking / dlib

Add positional_encodings_ layer to Dlib #3019