JS16_ProjectA

In this project we will lay the foundations for our system by integrating data from multiple sources into a central database. The database will serve the apps and the visualization tool that will be developed in other projects.

Developer information

Documentation

We are using apidoc to generate documentation for the RESTful API service. To get started follow these instructions:

Open a terminal and cd into the checked out git repository folder
Install the tool globally: sudo npm install apidoc -g
Generate the documentation: apidoc -i app/ -o apidoc/
Open the HTML file inside the apidoc folder or go to http://127.0.0.1:8080/doc/ if you already have set up the project

Setup NodeJS & MongoDB

Install nodejs and mongodb on your local machine (https://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/ and https://nodejs.org/en/download/package-manager/#debian-and-ubuntu-based-linux-distributions)
Clone this project to a folder on your hard drive, open a console and change into the folder you just checked out
Run sudo npm install to install any sub-modules required
Copy the config file in cfg to config.json and edit it
- You can leave username and password empty on default configurations
- Use 127.0.0.1 and port 27017 for default configurations
- In order to stream real-time Twitter data, please register your Twitter account at http://apps.twitter.com and insert your API keys into the config.json. Never upload your API keys to GitHub. By default, config.json is on .gitignore.
Start local MongoDB server with mongod
- You can specifc the port and folder you want to use: mongod --dbpath /your/db/path/here --port 27017
Run nodejs app.js to start the server
Node should show in console Mongoose connected - Node server is listening on port 8080
If needed, you can start MongoDB shell via mongo. Then type show dbs to see all databases. Type use db_name_here to switch to preferred database. With show collections you can see all tables (in NoSQL tables are called collections). With db.collection_name.find() you can output the collection content.

Scraping and filling the database

x is in the following a placeholder and has to be replaced by the intended collection. (e.g. characters)

To delete the collection and fill it again (new _ids are set!) with newly scraped data use: npm run refill --collection=x
To update the collection with newly scraped data (manual edits are overwritten!) use: npm run update --collection=x
To only add new properties/entries to the collection from a newly scrap use: npm run safeUpdate --collection=x

Available Collections:

'ages',
'characters',
'episodes',
'cities', (uses 'data/cities.json')
'continents', (uses 'data/continents.json')
'cultures',
'events',
'houses',
'regions',
'characterLocations', (requires cities collection to be filled)
'characterPaths', (requires characters collection to be filled)
'characterImages' (requires characters collection to be filled)
'characterPlods' (requires characters collection to be filled)

Updating the pageRank of characters

x is a placeholder for the file containing the pageRanks. (e.g. data/pageRanks.json)

Requirements: Characters and characterImages collections are up-to-date.
Run: npm run updatePageRanks --update=characters --file=x

Rostlab / JS16_ProjectA

readme