Open Alhajras opened 10 months ago
Most probably, your work will make use of some algorithms and data structures. Either your own, or such from previous work, or a combination of the two. In any case, provide information about the basic complexity of your algorithms, in particular their running time. Do this also if the the statement appears straightforward to you. For example, the running time of one of your algorithms may obviously be linear in the size of the input data. In any case, say it and provide an argument / proof for it!
Most probably, your work involves the implementation of an algorithm or data structure, or of a whole system. Whatever it is, your implementation should be thoroughly evaluated. The kind of evaluation depends on the nature of your problem. If the focus is on results of a particular quality, that should be evaluated. If the focus is on efficiency, running time and (if relevant) space consumption should be evaluated. Even if the focus is on quality, efficiency should be evaluated, too. One always wants to know the running time of a procedure and (if relevant) its space consumption. If there is a pre-processing phase, this should be evaluated separately. If the pre-processing consumes a lot of intermediate disk space or memory, that should also be evaluated. Think about the evaluation from the perspective of someone who wants to use your software in practice. What is it that you would want to know then?
Typically, there are other approaches which can be used (either directly or with small modifications / adjustments) to solve your problem. As a minimum, compare to the best one of these approaches. If there is a variety of principally different approaches, pick the best one for each principle. If there is no solution yet for your problem, think of a simple baseline algorithm (= the straightforward solution) and compare to that. Sometimes there are two or even three simple baseline algorithms. Do your evaluation on at least three different data sets of different kinds and sizes. If the amount of work needed per data set is very large, it is OK to use only two data sets.
[ ] Talk about the complexity of the algorithm running tim used.
[x] Web characterization [6]
[x] Requirements of search engine
----------- Crawler ------------------
[x] Features a crawler must provide [5]
[x] Features a crawler should provide [5]
[x] Crawler architecture [5]
[x] Types of Web Crawler [1]:
[x] Focused Crawler Techniques [1]:
[x] Different ways to crawl and render
[x] Challenges and issues
[x] Challenges in Crawling the Web [1]:
[x] Inverted index
[x] Dictionaries and tolerant retrieval [5]
Performance Metrics for Focused Web Crawler [1]
Performance Metrics for Collaborative and Mobile Crawler [1]
Types of indexing [2]:
My crawler is more generic and does not use settings file like the thesis mentioned [3] Resources::
[1] A survey of Web crawlers for information retrieval
[2] An anatomy for neural search engines
[3] AUTOMATED WEB STORE PRODUCT SCRAPING USING NODE.JS
[4] UbiCrawler: a scalable fully distributed Web crawler
[5] Introduction to IR Web crawling and indexes
[6] Effective web crawling