Closed lukaemon closed 1 month ago
You’re totally right — this is a vestige of much earlier experiments where we thought that wrapping ViT blocks independently might shave more off the training memory footprint.
I’ll push a fix soon removing this!
Thx for explanation. 👍
In
base_vision.py
,Since
VisionTransformer
is a superset ofBlock
, why construct a_or_policy
on top of them? Is wrapping the whole vit enough?