Various small notes - Githubissues

[x] Point out that CLEVR ablations indicate both feature-wise scaling and shifting is better than one alone?
[x] Shorten “Speech recognition” section text? I.e. exclude introducing the method name “dynamic layer normalization,” and condense the explanation to one sentence. It might be easier to do this condensing if this section occurs near the “conditional normalization” sections
[x] Convert passive “is used by” to active (This might be why Chris Olah thinks things are said in a roundabout way)

distillpub / post--feature-wise-transformations