Feature suggestions - Githubissues

pfeatherstone commented 3 years ago

Just putting some suggestions out there. Maybe we could organize these into projects.

[x] support scaled yolov4
[x] support yolov5 models
[x] add loss layer for training yolo models
[x] Create a fuse_conv_batchnorm visitor for enhanced performance during inference
[ ] bipartite matching loss (Hungarian algorithm)
[ ] transformer (for Detr like model)
[ ] add GIoU, DIoU, CIoU options to loss_yolo
[ ] see what happens when pre-training the (darkner53) backbone with Barlow twins loss using unlabelled imagenet, then fine tuning neck and heads with loss_yolo. Hopefully training the backbone with an unsupervised loss gives you better features than one trained specifically for classification. Presumably, with the backbone being frozen and therefore not requiring any gradient computation or batch-normalisation, this should accelerate training and reduce VRAM right ?
[ ] Add cutmix augmentation
[ ] Enhance CPU with NNPACK or oneDNN

pfeatherstone commented 2 years ago

Wow, those are HUGE names. I guess that's just C++ and the current design. I think this is another reason why not to rely heavily on a templated design, though not a very good one I have to admit.

pfeatherstone commented 2 years ago

@arrufat what happens if you compile with -fvisibility=hidden ?

arrufat commented 2 years ago

I just added add_compile_options(-fvisibility=hidden) to the CMakeLists.txt building that program, but nothing changed :/ (I can still find the mangled names in the objdump and the file size is the same...

pfeatherstone commented 2 years ago

Cheers. Oh well. We have to live with it.

pfeatherstone commented 2 years ago

@arrufat I've updated the suggestions list in the original post for reference. Let me know if some of it isn't appropriate or if you have stuff to add

arrufat commented 2 years ago

Well, obviously I'm not going to do all of that, but it's good to have them listed in case someone wants to jump in. Cutmix it's quite straightforward, right?

use extract_image_chip to crop a chip from an image.
get the rectangle transform: rectangle_transform tform = get_mapping_to_chip(chip) and transform the bounding boxes.
paste that chip into the destination image and move the bounding boxes accordingly (just a shift, so translate_rect).
find if we occluded boxes in the destination image and remove them.

In a way, it's similar to mosaic (copying, resizing and pasting).

arrufat commented 2 years ago

Also, wouldn't oneDNN be a better choice than NNPack?

And why would you want other IOU losses if you had cIOU?

pfeatherstone commented 2 years ago

Well, obviously I'm not going to do all of that

Of course!! I wasn't suggesting that. It's just to keep track of suggestions.

pfeatherstone commented 2 years ago

Also, wouldn't oneDNN be a better choice than NNPack?

And why would you want other IOU losses if you had cIOU?

OneDNN is Intel specific whereas NNPACK supports multiple CPU architectures. For example it has good Android support.

You can choose between GIOU, DIOU and CIOU. You might not necessarily always want CIOU. For example, yolov5 only uses GIoU

pfeatherstone commented 2 years ago

For cutmix, I was thinking more for classifiers/backbones. You would also need a label-smoothing version of cross entropy loss though which I don't think dlib has.

arrufat commented 2 years ago

OneDNN is Intel specific whereas NNPACK supports multiple CPU architectures.

Are you sure? On their GitHub page, it seems that they support other architectures.

pfeatherstone commented 2 years ago

I could be wrong. I'm sure they do work but they are definitely not optimised for other architectures.

arrufat commented 2 years ago

You can do label smoothing using the loss_multibinary_log, as I noted in https://github.com/davisking/dlib/pull/2141. Actually, in ResNet strikes back, they do exactly that.

Quoting from the paper:

Our procedure include recent advances from the literature as well as new proposals. Noticeably, we depart from the usual cross-entropy loss. Instead, our training solves a multi-classification problem when using Mixup and CutMix: we minimize the binary cross entropy for each concept selected by these augmentations, assuming that all the mixed concepts are present in the synthetized image.

pfeatherstone commented 2 years ago

I'm sorry i missed that. Presumably the labels can be any real number in range [-1,+1]?

arrufat commented 2 years ago

They can be any real value, just positive for one class and negative for the other, and the absolute value, would be the weight of the label. So if you have a dataset with imbalace 1:2, you can set the labels as 0.67, -1.34 or something like that. The values themselves don't matter, what matters is the relation between them. If you put large values, you might have to reduce the learning rate.

pfeatherstone commented 2 years ago

I thought the smoothed version of cross entropy still assumed a softmax layer during inference. Like what is done in torch Whereas the loss in dlib is using sigmoid right? I don't know which is best but there is definitely functionality there already in dlib, you're right. I'm not trying to start another religious war don't worry. Just throwing ideas. If you don't find them appropriate I will remove them

arrufat commented 2 years ago

From what I've seen, if you want to output a single label per image, training using softmax or sigmoid leads to almost identical results.

I prefer the latter, since it's a more generic approach. It can deal with the cases when:

none of the training classes appear at inference time (since all sigmoids can output zero, they don't need to add up to 1)
more than one class appears in an image, they don't need to fight, they can output all 1, if they want.

In fact, the classification part of YOLO uses a sigmoid, and generally, we train it with bounding boxes that only have one label.

If you check that ResNet paper, you'll see the experiments between what they call Cross Entropy (CE) and Binary Cross Entropy (BCE). Which correspond to the loss_multiclass_log and loss_multibinary_log in dlib, respectively.

pfeatherstone commented 2 years ago

Interesting, yeah I see your point. I haven't read that paper. I read https://arxiv.org/pdf/1905.04899.pdf

arrufat commented 2 years ago

I was checking YOLOv5 code, and it uses the CIOU loss, as I thought (it's even hard-coded) . I think the other ones don't make much sense if you have the CIOU loss at your disposal.

pfeatherstone commented 2 years ago

Oh ok. They must have changed it recently. For a long time I was following the code, and even though they supported all 3 IOU losses, they used GIOU. Maybe that was with their Yolov3 repo. The author even said in issue somewhere that CIOU provided no benefits over GIOU. But that might not be the case anymore.

https://github.com/ultralytics/yolov3/issues/996#issuecomment-607473569

pfeatherstone commented 2 years ago

In any case, I'm not instigating a debate here, I just think it would be a nice addition to support all 3. Again, I'm not suggesting you do them. It's just an idea. It could very well be the case that for some models GIOU gives you equal performance with less compute.

davisking commented 2 years ago

@davisking talking about type erasure. I was thinking of adding a general purpose type erasure module similar to boost-ext TE or Dyno, but targeting C++11. What do you think ? hopefully it won't require a concept map like what the Dyno library does and will be succinct like the TE library. I've got it working with c++14 but think it should be possible to have a c++11 version. So you could have polymorphic objects on the stack without inheritance. This is really useful in API design without having loads of smart pointers using inheritance and you can use good old c++ objects.

I haven't used any of those libraries so I'm not 100% what you mean. I think a lot of the tools in boost get kind of carried away with "hey this is neat" vs "this is really the best way to deliver value to humans" so I don't end up using boost very much :|

davisking commented 2 years ago

@davisking jumping back to DL, have you tried https://pytorch.org/docs/stable/generated/torch.optim.LBFGS.html with your torch models ? I haven't. If so, what is your experience with it ? There is also https://github.com/hjmshi/PyTorch-LBFGS

I haven't. I assume it's a proper LBFGS implementation though.

davisking commented 2 years ago

@davisking jumping back to DL, have you tried https://pytorch.org/docs/stable/generated/torch.optim.LBFGS.html with your torch models ? I haven't. If so, what is your experience with it ? There is also https://github.com/hjmshi/PyTorch-LBFGS

Oh that's a SGD tool that uses the LBFGS equation to compute the step direction. That's not really "LBFGS" though. LBFGS is supposed to be used with a line search and is not a SGD type algorithm. Like it does not operate on "batches".

pfeatherstone commented 2 years ago

@davisking talking about type erasure. I was thinking of adding a general purpose type erasure module similar to boost-ext TE or Dyno, but targeting C++11. What do you think ? hopefully it won't require a concept map like what the Dyno library does and will be succinct like the TE library. I've got it working with c++14 but think it should be possible to have a c++11 version. So you could have polymorphic objects on the stack without inheritance. This is really useful in API design without having loads of smart pointers using inheritance and you can use good old c++ objects.

I haven't used any of those libraries so I'm not 100% what you mean. I think a lot of the tools in boost get kind of carried away with "hey this is neat" vs "this is really the best way to deliver value to humans" so I don't end up using boost very much :|

Neither of them are boost libraries actually. TE is 300 lines of code and that's it. I would check oit TE. It's really cool. It's dynamic polymorphism without inheritance. So non-intrusive. I think you would like it a lot. But it requires c++17. You can make it c++14 with a couple mods but requires a bit more work to make it c++11. But with it, any kind of type erasure is trivially done.

davisking commented 2 years ago

@davisking talking about type erasure. I was thinking of adding a general purpose type erasure module similar to boost-ext TE or Dyno, but targeting C++11. What do you think ? hopefully it won't require a concept map like what the Dyno library does and will be succinct like the TE library. I've got it working with c++14 but think it should be possible to have a c++11 version. So you could have polymorphic objects on the stack without inheritance. This is really useful in API design without having loads of smart pointers using inheritance and you can use good old c++ objects.

I haven't used any of those libraries so I'm not 100% what you mean. I think a lot of the tools in boost get kind of carried away with "hey this is neat" vs "this is really the best way to deliver value to humans" so I don't end up using boost very much :|

Neither of them are boost libraries actually. TE is 300 lines of code and that's it. I would check oit TE. It's really cool. It's dynamic polymorphism without inheritance. So non-intrusive. I think you would like it a lot. But it requires c++17. You can make it c++14 with a couple mods but requires a bit more work to make it c++11. But with it, any kind of type erasure is trivially done.

Yeah it's clever. It seems like a hack around not having template concepts (which are in C++20 now) though. Like use template concepts. They ought to be great. Like update your compiler you know :D Dlib has to wait a long time to update, but end users, which includes us when not adding code to the dlib repo can use all the newest stuff. Like I use C++17 at work and it's great.

pfeatherstone commented 2 years ago

It's not the same as concepts though. It's closer to inheritance than it is to templates. It's a generalisation or std::function but for any kind of objects. I think that's the best way I can explain it. Like std::function isn't a template hack. It's achieving something else. But yeah I see your point, I could just use that library and accept the c++17 requirement. I just like to keep stuff as portable as possible. I still have to support old compilers for some of my stuff at work. That's why I've submitted a few PRs in the past that add c++14 and c++17 backports

pfeatherstone commented 2 years ago

And I don't quite understand the argument for updating my compiler. I've done that before. I build it then run the binary on another Linux box then it complains it can't find the right version of glibc or something. And aggressively statically linking everything doesn't always work. So I always just use the default compiler for a system.

pfeatherstone commented 2 years ago

It would be great if gcc and clang supported a feature like "clang update" or "gcc update" and everything just worked and I had all the latest features. That is where modern languages shine. They've embraced bleeding edge a bit more. I would really like it if c++ no longer supported dynamic linking. Disk space is no longer an issue. Unless you're dynamically loading modules at runtime, I see no benefit in shared libraries. For me they cause problems and means I can't really do stuff like updating my compiler and expect everything to work.

pfeatherstone commented 2 years ago

Anyway, silly rant over. I proposed a type erasure utility since you mentioned that IF you were to re-write the DNN stuff, you would use a more dynamic model, likely using type erasure (instead of inheritance) to record forward (and backward?) functions. So it seemed appropriate to mention it.

davisking commented 2 years ago

It's not the same as concepts though. It's closer to inheritance than it is to templates. It's a generalisation or std::function but for any kind of objects.

Sure, but why not use templates? Like aside from being able to hide some implementation details outside a header I would prefer template concepts to this most of the time. I mean, I use std::function sometimes too. But IDK. This feels excessively clever, but maybe it isn't. What do you use this for that isn't nicely accomplished via a shared_ptr and inheritance of a template?

I think that's the best way I can explain it. Like std::function isn't a template hack. It's achieving something else. But yeah I see your point, I could just use that library and accept the c++17 requirement. I just like to keep stuff as portable as possible. I still have to support old compilers for some of my stuff at work. That's why I've submitted a few PRs in the past that add c++14 and c++17 backports.

Yeah, I know this pain well :|

davisking commented 2 years ago

And I don't quite understand the argument for updating my compiler. I've done that before. I build it then run the binary on another Linux box then it complains it can't find the right version of glibc or something. And aggressively statically linking everything doesn't always work. So I always just use the default compiler for a system.

I mean yeah, but getting a new compiler is great when you can get it :D

davisking commented 2 years ago

It would be great if gcc and clang supported a feature like "clang update" or "gcc update" and everything just worked and I had all the latest features. That is where modern languages shine. They've embraced bleeding edge a bit more. I would really like it if c++ no longer supported dynamic linking. Disk space is no longer an issue. Unless you're dynamically loading modules at runtime, I see no benefit in shared libraries. For me they cause problems and means I can't really do stuff like updating my compiler and expect everything to work.

Depends on your domain. For instance, I work on a large software system that really needs dynamic linking. It's a bunch of separate processes. Linking it all into one big monster process would be less good since it is a safety critical system. We don't want one thing blowing up to take down the whole thing. So that process isolation is a big deal. And at the time time, if everything was statically linked then we would quite literally run out of RAM.

But yeah, for most applications static linking is the way to go. Or mostly static linking anyway.

davisking commented 2 years ago

Anyway, silly rant over. I proposed a type erasure utility since you mentioned that IF you were to re-write the DNN stuff, you would use a more dynamic model, likely using type erasure (instead of inheritance) to record forward (and backward?) functions. So it seemed appropriate to mention it.

Yeah, would definitely use type erasure if rewriting the DNN stuff.

And I deliberately didn't use type erasure the first time, since type erasure makes serialization more complicated. Not a lot more complicated, but still more complicated. It makes a lot of stuff a little bit more complicated. But still, that was not optimizing the right thing. I didn't expect DNNs to end up being as large and crazy as they got, and at those sizes and complexities type erasure is definitely the better way to go.

pfeatherstone commented 2 years ago

What do you use this for that isn't nicely accomplished via a shared_ptr and inheritance of a template?

I would watch Sean Parent's talk "Inheritance is the base class of evil" or Louis Dionne's CPPCON talk on Dyno. They would explain it better than I do

arrufat commented 2 years ago

@pfeatherstone I just noticed (by chance) that you can load a network that differs in the definition only in the number of repeated layers. This means that, due to the way that I defined the YOLOv5 models, you can pick and build just one variant, and load any of the other variants. That's kind of cool :) It would be even cooler to be able to change it at runtime programmatically, though, similar to the number of outputs in the fc layer, or to the number of filters in the convolutional layers.

I will check if I can add an option to change the number of the repetitions in the repeat layer at runtime. @davisking, do you think that's feasible?

pfeatherstone commented 2 years ago

That would be awesome!

pfeatherstone commented 2 years ago

It would also be cool if the batch-normalization layers had a runtime option to set them to training mode or evaluation mode (affine layer behaviour) similar to other NN frameworks. So you would only ever need 1 network definition, not 2 (one for training and one for evaluation).

Then you could do something like model.train() to set it back to batch-normalization behaviour and keep track of statistics, or do model.eval() and set it to affine behaviour. That would probably save compilation time massively too. (half it in theory)

pfeatherstone commented 2 years ago

Very quickly, going back to enhanced loss_yolo layer supporting the IOU variants. Currently, it assumes the inputs to the yolo loss function have had sigmoid applied to them. That wouldn't work with GIOU, DIOU or CIOU. Is there any reason why the loss_yolo layer doesn't do the sigmoid itself? If it did, then you could add CIOU for example and not worry about having to undo the sigmoid.

arrufat commented 2 years ago

Very quickly, going back to enhanced loss_yolo layer supporting the IOU variants. Currently, it assumes the inputs to the yolo loss function have had sigmoid applied to them. That wouldn't work with GIOU, DIOU or CIOU. Is there any reason why the loss_yolo layer doesn't do the sigmoid itself? If it did, then you could add CIOU for example and not worry about having to undo the sigmoid.

Yes, there's a technical reason. First, I wanted to perform the sigmoid operation inside the loss function, but the get_output() method returns a const reference, so it meant that I had to copy the output tensor in the loss layer to apply the sigmoid. That's why I decided to do it outside. Moreover, the Darknet code also does it in the network when new_coords==1, and it has the option to do the CIOU loss.

davisking commented 2 years ago

@pfeatherstone I just noticed (by chance) that you can load a network that differs in the definition only in the number of repeated layers. This means that, due to the way that I defined the YOLOv5 models, you can pick and build just one variant, and load any of the other variants. That's kind of cool :) It would be even cooler to be able to change it at runtime programmatically, though, similar to the number of outputs in the fc layer, or to the number of filters in the convolutional layers.

I will check if I can add an option to change the number of the repetitions in the repeat layer at runtime. @davisking, do you think that's feasible?

Yeah, probably not a problem.

dlibml / darknet

Feature suggestions #3