Closed alexsb closed 8 years ago
current dataset to import: http://datahub.io/dataset/dblp
problem: the importer crashed while having already a neo4j database of 5GB loaded ~ 700.000 articles which took with the current cypher statement >10h
OK, either we subset or we use the citevis dataset.
Any progress on that?
Or to ask this differently: is the database file somewhere?
see gdrive
Done
As a demo dataset we want to visualize co-authorships between InfoVis researchers.
Nodes: People, Papers Edges: Papers? Institutions? Sets: Institutions (could also be nodes), Research Fields (InfoVis, Visualization, HCI), Topic areas (graphs, evaluation, etc), keywords?
We could acquire the data by scraping from google scholar. Here are two scrapers: https://github.com/ckreibich/scholar.py http://www.icir.org/christian/scholar.html
We could use the InfoVis label as a top-level filter: https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:information_visualization
IEEE Xplore is another option: http://libguides.mit.edu/apis
Microsoft academic search did something like this in the past. It is vastly outdated though.
Another project that uses that type of data by Marian Dörk: http://mariandoerk.de/pivotpaths/ This was done at MS research, so probably uses MS data (outdated)
CiteVis by Stasko & co @georgia tech http://www.cc.gatech.edu/gvu/ii/citevis/ This includes the data file - if we're lazy we could just use that: http://www.cc.gatech.edu/gvu/ii/citevis/infovis-citation-data.txt
Another vis using (I guess) this data: http://lliquid.github.io/citematrix/ http://www.cc.gatech.edu/gvu/ii/citevis/VIS25/