Alhajras / webscraper

Configurable search engine written in Python and Angular. It supports indexing as well.
1 stars 0 forks source link

Chapter 3 Background #21

Open Alhajras opened 10 months ago

Alhajras commented 10 months ago

----------- Crawler ------------------

Alhajras commented 8 months ago

Theoretical analysis

Most probably, your work will make use of some algorithms and data structures. Either your own, or such from previous work, or a combination of the two. In any case, provide information about the basic complexity of your algorithms, in particular their running time. Do this also if the the statement appears straightforward to you. For example, the running time of one of your algorithms may obviously be linear in the size of the input data. In any case, say it and provide an argument / proof for it!

Empirical analysis

Most probably, your work involves the implementation of an algorithm or data structure, or of a whole system. Whatever it is, your implementation should be thoroughly evaluated. The kind of evaluation depends on the nature of your problem. If the focus is on results of a particular quality, that should be evaluated. If the focus is on efficiency, running time and (if relevant) space consumption should be evaluated. Even if the focus is on quality, efficiency should be evaluated, too. One always wants to know the running time of a procedure and (if relevant) its space consumption. If there is a pre-processing phase, this should be evaluated separately. If the pre-processing consumes a lot of intermediate disk space or memory, that should also be evaluated. Think about the evaluation from the perspective of someone who wants to use your software in practice. What is it that you would want to know then?

Typically, there are other approaches which can be used (either directly or with small modifications / adjustments) to solve your problem. As a minimum, compare to the best one of these approaches. If there is a variety of principally different approaches, pick the best one for each principle. If there is no solution yet for your problem, think of a simple baseline algorithm (= the straightforward solution) and compare to that. Sometimes there are two or even three simple baseline algorithms. Do your evaluation on at least three different data sets of different kinds and sizes. If the amount of work needed per data set is very large, it is OK to use only two data sets.