JS16_ProjectA
In this project we will lay the foundations for our system by integrating data from multiple sources into a central database. The database will serve the apps and the visualization tool that will be developed in other projects.
Links
Developer information
Documentation
We are using apidoc to generate documentation for the RESTful API service. To get started follow these instructions:
- Open a terminal and
cd
into the checked out git repository folder
- Install the tool globally:
sudo npm install apidoc -g
- Generate the documentation:
apidoc -i app/ -o apidoc/
- Open the HTML file inside the apidoc folder or go to http://127.0.0.1:8080/doc/ if you already have set up the project
Setup NodeJS & MongoDB
- Install nodejs and mongodb on your local machine (https://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/ and https://nodejs.org/en/download/package-manager/#debian-and-ubuntu-based-linux-distributions)
- Clone this project to a folder on your hard drive, open a console and change into the folder you just checked out
- Run
sudo npm install
to install any sub-modules required
- Copy the config file in
cfg
to config.json
and edit it
- You can leave username and password empty on default configurations
- Use 127.0.0.1 and port 27017 for default configurations
- In order to stream real-time Twitter data, please register your Twitter account at http://apps.twitter.com and insert your API keys into the config.json. Never upload your API keys to GitHub. By default, config.json is on .gitignore.
- Start local MongoDB server with
mongod
- You can specifc the port and folder you want to use:
mongod --dbpath /your/db/path/here --port 27017
- Run
nodejs app.js
to start the server
- Node should show in console
Mongoose connected - Node server is listening on port 8080
- If needed, you can start MongoDB shell via
mongo
. Then type show dbs
to see all databases. Type use db_name_here
to switch to preferred database. With show collections
you can see all tables (in NoSQL tables are called collections). With db.collection_name.find()
you can output the collection content.
Scraping and filling the database
x
is in the following a placeholder and has to be replaced by the intended collection. (e.g. characters)
- To delete the collection and fill it again (new _ids are set!) with newly scraped data use:
npm run refill --collection=x
- To update the collection with newly scraped data (manual edits are overwritten!) use:
npm run update --collection=x
- To only add new properties/entries to the collection from a newly scrap use:
npm run safeUpdate --collection=x
Available Collections:
- 'ages',
- 'characters',
- 'episodes',
- 'cities', (uses 'data/cities.json')
- 'continents', (uses 'data/continents.json')
- 'cultures',
- 'events',
- 'houses',
- 'regions',
- 'characterLocations', (requires cities collection to be filled)
- 'characterPaths', (requires characters collection to be filled)
- 'characterImages' (requires characters collection to be filled)
- 'characterPlods' (requires characters collection to be filled)
Updating the pageRank of characters
x
is a placeholder for the file containing the pageRanks. (e.g. data/pageRanks.json)
- Requirements: Characters and characterImages collections are up-to-date.
- Run:
npm run updatePageRanks --update=characters --file=x