also what about spoken language corpora? e.g. work by Oliver Adams & colleagues inducing linguistic information from spoken data, skipping the usual transcription set. I'm not saying you need to also deal with spoken language data, but it does need to be acknowledged