While reading the paper, I came to know that the authors used 13 convs from VGG16 network to extract deep features. Since VGG nets have been around for a quite a long time, why didn't you choose some more efficient and accurate networks? Or is there something I don't know of? Thank you.