Data4Democracy / internal-displacement

Studying news events and internal displacement.
43 stars 27 forks source link

Internal Displacement

This repository is now archived. The project is being continued but is currently closed to new members. Data for Democracy is a community driven organization. If you want to start a new project in a similar area, you are welcome to do so! Check out the #refugees channel and rally your fellow data nerds!

Slack Channel: #internal-displacement

Project Description: Classifying, tagging, analyzing and visualizing news articles about internal displacement. Based on a challenge from the IDMC.

The tool we are building carries out a number of functions:

  1. Ingest a list of URLs
  2. Scrape content from the respective web pages
  3. Tag the article as relating to disaster or conflict
  4. Extract key information from text
  5. Store information in a database
  6. Display data in interactive visualisations

The final aim is a simple app that can perform all of these functions with little technical knowledge needed by the user.

Project Lead:

Maintainers: These are the additional people mainly responsible for reviewing pull requests, providing feedback and monitoring issues.

Scraping, processing, NLP

Front end and infrastructure

Getting started:

  1. Join the Slack channel.
  2. Read the rest of this page and the IDETECT challenge page to understand the project.
  3. Check out our issues (small tasks) and milestones. Keep an eye out for help-wanted, beginner-friendly, and discussion tags.
  4. See something you want to work on? Make a comment on the issue or ping us on Slack to let us know.
  5. Beginner with GitHub? Make sure you've read the steps for contributing to a D4D project on GitHub.
  6. Write your code and submit a pull request to add it to the project. Reach out for help any time!

Things you should know

Project Overview

There are millions of articles containing information about displaced people. Each of these is a rich source of information that can be used to analyse the flow of people and reporting about them.

We are looking to record:

Project Components

These are the main parts and functions that make up the project.

Running in Docker

You can run everything as you're accustomed to by installing dependencies locally, but another option is to run in a Docker container. That way, all of the dependencies will be installed in a controlled, reproducible way.

  1. Install Docker: https://www.docker.com/products/overview

  2. Run this command:

    docker-compose up

    or

    docker-compose -f docker-compose-spacy.yml up

    The spacy version will include the en_core_web_md 1.2.1 NLP model It is multiple gigabytes in size. The one without the model is much smaller.

    Either way, this will take some time the first time. It's fetching and building all of its dependencies. Subsequent runs should be much faster.

    This will start up several docker containers, running postgres, a Jupyter notebook server, and the node.js front end.

    In the output, you should see a line like:

    jupyter_1  |         http://0.0.0.0:3323/?token=536690ac0b189168b95031769a989f689838d0df1008182c

    That URL will connect you to the Jupyter notebook server.

  3. Visit the node.js server at http://localhost:3322

Note: You can stop the docker containers using Ctrl-C.

Note: If you already have something running on port 3322 or 3323, edit docker-compose.yml and change the first number in the ports config to a free port on your system. eg. for 9999, make it:

    ports:
      - "9999:3322"

Note: If you want to add python dependencies, add them to requirements.txt and run the jupyter-dev version of the docker-compose file:

docker-compose -f docker-compose-dev.yml up --build

You'll need to use the jupyter-dev version until your dependencies are merged to master and a new version is built. Talk to @aneel on Slack if you need to do this.

Note: if you want to run SQL commands againt the database directly, you can do that by starting a Terminal within Jupyter and running the PostgreSQL shell:

psql -h localdb -U tester id_test

Note: If you want to connect to a remote database, edit the docker.env file with the DB url for your remote database.

Skills Needed

Tips for working on this project

Things that inspire us

Refugees on IBM Watson News Explorer