This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better understand a domain (or topic) as it is represented on the Web. It achieves this by integrating human insights with machine computation (data mining and machine learning) through visualization. DDT allows a domain expert to visualize and analyze pages returned by a search engine or a crawler, and easily provide feedback about relevance. This feedback, in turn, can be used to address two challenges:
Building and deploying the Domain Discovery Tool can either be done using its Makefile to create a local development environment, or automatically by conda or Docker for deployment. The conda build environment is currently only supported on 64-bit OS X and Linux.
First install conda, either through the Anaconda or miniconda installers provided by Continuum. You will also need Git and a Java Development Kit. These are system tools that are generally not provided by conda.
Clone the DDT repository and enter it:
https://github.com/ViDA-NYU/domain_discovery_tool
cd domain_discovery_tool
Use the make
command to build DDT and download/install its dependencies.
make
After a successful installation, you can activate the DDT development environment:
source activate ddt
And (from the top-level domain_discovery_tool
directory), start
supervisord to run the web application and its associated services:
supervisord
Now you should be able to head to http://localhost:8084/ to interact with the tool.
First, make sure you have Docker installed and running. Then, you can create an DDT image using the Dockerfile. Run the following command in the root folder of this project:
docker build -t domain_discovery_tool .
or download the latest published docker build (you do not need to clone the DDT repository in this case):
docker pull vidanyu/ddt:latest
Run the app using the Docker image that you just built (or pulled). This starts the elasticsearch and the DDT server:
docker run -i -p 8084:8084 -p 9200:9200 -t <domain_discovery_tool or vidanyu/ddt:latest> /ddt/run_demo.sh
To see the app running, go to:
http://localhost:8084/seedcrawler
Alternativaly, you can also specify an external ElasticSearch server address using an enviroment variable:
docker run -p 8084:8084 -e "ELASTICSEARCH_SERVER=http://127.0.0.1:9200" -i -t <domain_discovery_tool or vidanyu/ddt:latest>
Detailed Description of the tool
Note: To follow the demo videos download and use the following demo build version of DDT:
docker pull vidanyu/ddt:2.7.0-demo
docker run -i -p 8084:8084 -p 9200:9200 -p 9001:9001 -t vidanyu/ddt:2.7.0-demo
Yamuna Krishnamurthy, Kien Pham, Aecio Santos, and Juliana Friere. 2016. Interactive Web Content Exploration for Domain Discovery (Interactive Data Exploration and Analytics (IDEA) Workshop at Knowledge Discovery and Data Mining (KDD), San Francisco, CA).
DDT Development Team [ddt-dev@vgc.poly.edu]