Alhajras / webscraper

Configurable search engine written in Python and Angular. It supports indexing as well.
1 stars 0 forks source link

Chapter 5 Evaluation #31

Open Alhajras opened 9 months ago

Alhajras commented 9 months ago

Crawlers

Coverage: the percentage of relevant pages that the crawler can discover and download from the web1. Freshness: the degree to which the crawler can keep up with the changes and updates of the web pages. Quality: the relevance and importance of the pages that the crawler selects for downloading. Scalability: the ability of the crawler to handle large-scale and distributed crawling tasks efficiently and robustly. Politeness: the extent to which the crawler respects the rules and policies of the web servers and avoids overloading them. To measure these metrics, one can use various methods such as:

Benchmarks: using a predefined set of web pages or domains as a reference for evaluating the crawler’s performance. Simulations: using a synthetic or sampled web graph to model the structure and dynamics of the web and test the crawler’s behavior. Experiments: running the crawler on a real or partial web and collecting data on its actions and outcomes.

Indexers Check movies Fuzzy search

UI