Code release - Githubissues

And I have a question.

In the paper, you mentioned that "Multiple surgical workflow analysis models like OperA [5], SAHC [7], and Trans-SVNet [11] incorporated Transformer layers to TCNs in order to efficiently combine the spatial and temporal features. Nonetheless, their dependence on TCN modeling leads to a loss of finer-grained information, and using temporalagnostic backbones limits frame embeddings to capture only spatial information."

Why does TCN modeling lose fine-grained information? I'm a little confused about this.

BCV-Uniandes / MuST

Code release #1