Some questions Ubuntu 16.04

On the main page of Ache repo description reads ''Web interface for searching crawled pages in real-time''. but when I started running ache there was no interface. (Maybe i misunderstood something) Also I stuck a little when creating a Configuration File. Is this documentation up to date ? (https://ache.readthedocs.io/en/latest/index.html) How to set Ache to return me a CSV file (Relevant ) without urls set in seed file. It is quite time consuming to find new URLs through 15000+ harvest. My aim is to find databases with scientific publications/books, may you have some suggestion how to increase accuracy? Thank you for your quick support and help! You are doing a great job.

Here is my Configuration File

#
# Example of configuration for running a Focused Crawl
#

# Store pages classified as irrelevant pages by the target page classifier
target_storage.store_negative_pages: true

# Limit the max number of pages crawled per domain, in order to avoid crawling
# too many pages from same somain and favor discovery o new domains
link_storage.max_pages_per_domain: 10000

# Disable "seed scope" to allow crawl pages from any domain
link_storage.link_strategy.use_scope: false

# Set initial link classifier a simple one
link_storage.link_classifier.type: MaxDepthLinkClassifier
link_storage.link_classifier.max_depth: 3
# Train a new link classifier while the crawler is running. This allows
# the crawler automatically learn how to prioritize links in order to
# efficiently locate relevant content while avoiding the retrieval of
# irrelevant content.
link_storage.online_learning.enabled: true
link_storage.online_learning.type: FORWARD_CLASSIFIER_BINARY
link_storage.online_learning.learning_limit: 1000

# Always select top-k links with highest priority to be scheduled
link_storage.link_selector: TopkLinkSelector

# Configure the minimum time interval (in milliseconds) to wait between requests
# to the same host to avoid overloading servers. If you are crawling your own
# web site, you can descrease this value to speed-up the crawl.
link_storage.scheduler.host_min_access_interval: 5000

# Configure the User-Agent of the crawler
crawler_manager.downloader.user_agent.name: ACHE
crawler_manager.downloader.user_agent.url: https://github.com/ViDA-NYU/ache

VIDA-NYU / ache

Some questions Ubuntu 16.04 #172