Closed gpengzhi closed 4 years ago
I think this is too much code bloat. Can't we just have one *BERT module for all the (slightly) different variants? Even if we need different ones, we can reuse most of the duplicate code. Even if that's not possible, we could unify the interfaces, so users can just use BERTEncoder("span_bert")
and get a potentially different class under the hood.
resolve #230
SpanBERT extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. Differing from the standard BERT, the SpanBERT model does not use segmentation embedding.