dvmorozov / arxiv

ArxivExpress - arxiv.org client for Android and iOS, ArxivNavigator - interactive arxiv.org metadata visualization. I would appreciate any way of contributing: GitHub issue, email or pull request.
https://dvmorozov.github.io/arxiv/
Other
0 stars 0 forks source link

Graph displaying topics vs. time #111

Open dvmorozov opened 1 year ago

dvmorozov commented 1 year ago

Solution

  1. Implement Python-script to split corpus into a set of sub-spaces with granularity of one month. Use metadata to do that. :heavy_check_mark:
  2. Implement Python script to mine topics sequentially and fill mining data to JavaScript-file (which is used as graph data). :heavy_check_mark:
  3. Count and display not existing, skipped (by version) and copied files. Write not existing files into CSV-file together with article id. and version.
  4. Print estimated time of processing set of all month.
  5. Implement Python script saving article identifiers as set of JSON-files for parallel mining. :bulb:
  6. Implement Python script for mining topics for month given as script parameter for parallel mining. :bulb:
  7. Print estimated time of processing metadata. :x: Impossible to implement with ijson because it doesn't get number of articles in advance.
  8. Implement Python script for parallel mining topics on partitioned corpus. :bulb:
  9. Mine topics for the last year. :bulb:
  10. Group articles by metadata topics (abbreviations). :bulb:
  11. Compute number of articles by topic from metadata vs. time. Use this for graph data. :bulb:
  12. Number of topics and topic items should be adjustable.
  13. Use stream graph.

Related

  1. 109.

  2. 83.

  3. 74.