[FR] Enhance Text-Corpora Analysis

Fix multiple problems/needs at once:

[x] [DONE in Analyzer v0.13.0] The data in $text_corpus_stats.* data is rather large, so dividing them under languages is needed. The analyzer only works per language-version of the dataset anyway.
[x] [DONE in Analyzer v0.13.0] Although new text corpus data is not exported to the git repo, older ones are there and can be reached by git tools. So we can get the data at a specific commit and analyze it. This way we can see the changes in time.
[x] [DONE in Analyzer v0.13.0] Add more analysis
- [x] Grapheme distribution
- [x] Phoneme distribution (if supported)
[x] Analyze text corpus usage in the buckets/splits wrt the above extracted text-corpus

HarikalarKutusu / cv-tbox-dataset-compiler