issues
search
dvmorozov
/
arxiv
ArxivExpress - arxiv.org client for Android and iOS, ArxivNavigator - interactive arxiv.org metadata visualization. I would appreciate any way of contributing: GitHub issue, email or pull request.
https://dvmorozov.github.io/arxiv/
Other
0
stars
0
forks
source link
Graph displaying topics vs. time
#111
Open
dvmorozov
opened
1 year ago
dvmorozov
commented
1 year ago
Solution
Implement Python-script to split corpus into a set of sub-spaces with granularity of one month. Use metadata to do that. :heavy_check_mark:
Implement Python script to mine topics sequentially and fill mining data to JavaScript-file (which is used as graph data). :heavy_check_mark:
Count and display not existing, skipped (by version) and copied files. Write not existing files into CSV-file together with article id. and version.
Print estimated time of processing set of all month.
Implement Python script saving article identifiers as set of JSON-files for parallel mining. :bulb:
Implement Python script for mining topics for month given as script parameter for parallel mining. :bulb:
Print estimated time of processing metadata. :x: Impossible to implement with ijson because it doesn't get number of articles in advance.
Implement Python script for parallel mining topics on partitioned corpus. :bulb:
Mine topics for the last year. :bulb:
Group articles by metadata topics (abbreviations). :bulb:
Compute number of articles by topic from metadata vs. time. Use this for graph data. :bulb:
Number of topics and topic items should be adjustable.
Use
stream graph
.
Related
109.
83.
74.
Solution
Related
109.
83.
74.