Alhajras / webscraper

Configurable search engine written in Python and Angular. It supports indexing as well.
1 stars 0 forks source link

References #20

Open Alhajras opened 10 months ago

Alhajras commented 10 months ago
  1. Amount of data created, consumed, and stored 2010-2020, with forecasts to 2025

Published by Petroc Taylor, Sep 8, 2022 link

  1. The anatomy of a large-scale hypertextual Web search engine
  2. Effective Web Crawling by Carlos Castillo
  3. Market share of leading desktop search engines worldwide from January 2015 to March 2023
  4. UbiCrawler: a scalable fully distributed Web crawler
  5. Cho J, Garcia-Molina H. Parallel crawlers. Proceedings of the 11th International World Wide Web Conference, 2002. ACM Press: New York, 2002.
  6. 5-web-scraping-tools-comparison
  7. Estimating frequency of change.
  8. Introduction to IR
  9. Wiki
  10. A survey of Web crawlers for information retrieval
  11. web-scraping-tools-and-apps
  12. InformationRetrieval slides

[Eic94] D. Eichmann. The RBSE spider: balancing effective search against web load. In Proceedings of the first World Wide Web Conference, Geneva, Switzerland, May 1994.

[Pin94] Brian Pinkerton. Finding what people want: Experiences with the WebCrawler. In Proceed-ings of the first World Wide Web Conference, Geneva, Switzerland, May 1994.

[McB94] Oliver A. McBryan. GENVL and WWWW: Tools for taming the web. In Proceedings of the first World Wide Web Conference, Geneva, Switzerland, May 1994.

[BP98] Sergei Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search en- gine. Computer Networks and ISDN Systems, 30(1–7):107–117, April 1998.

[BCSV04] Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. UbiCrawler: a scal- able fully distributed Web crawler. Software, Practice and Experience, 34(8):711–726, 2004.

[CGM03b] Junghoo Cho and Hector Garcia-Molina. Estimating frequency of change. ACM Transactions on Internet Technology, 3(3), August 2003.