0-5788719150923125 / praxis

as above, so below
https://src.eco
MIT License
2 stars 1 forks source link

Implement basic LayerShuffle #5

Closed Vectorrent closed 1 month ago

Vectorrent commented 1 month ago

Actual layer-sequence prediction is difficult to make differentiable, so we're going to just use naive shuffling for now. There is research to support the notion that random layer shuffling during training and inference can lead to more robust distributed architectures - which is something we're going to need, here. We will improve it later, if necessary.