Graph displaying topics vs. time

Solution

Implement Python-script to split corpus into a set of sub-spaces with granularity of one month. Use metadata to do that. :heavy_check_mark:
Implement Python script to mine topics sequentially and fill mining data to JavaScript-file (which is used as graph data). :heavy_check_mark:
Count and display not existing, skipped (by version) and copied files. Write not existing files into CSV-file together with article id. and version.
Print estimated time of processing set of all month.
Implement Python script saving article identifiers as set of JSON-files for parallel mining. :bulb:
Implement Python script for mining topics for month given as script parameter for parallel mining. :bulb:
Print estimated time of processing metadata. :x: Impossible to implement with ijson because it doesn't get number of articles in advance.
Implement Python script for parallel mining topics on partitioned corpus. :bulb:
Mine topics for the last year. :bulb:
Group articles by metadata topics (abbreviations). :bulb:
Compute number of articles by topic from metadata vs. time. Use this for graph data. :bulb:
Number of topics and topic items should be adjustable.
Use stream graph.