Network-Science / WebCrawler

A Firefox Extension Project utilizing Web Crawling Algorithm with Web Workers
0 stars 0 forks source link

Research Crawler Algo #1

Open JeremyBarbosa opened 4 years ago

JeremyBarbosa commented 4 years ago

Requirements

Defintion of Done Provide a white paper describing the algorithm that we will implement

JeremyBarbosa commented 4 years ago

It seems that based on http://people.cse.nitc.ac.in/sites/default/files/aviralnigam/files/web_crawling_algorithms.pdf A* Search is the best.

mylee1995 commented 4 years ago

https://stackoverflow.com/questions/35042798/get-dom-tree-with-javascript

This seems like a good resource in configuring algorithm for extracting DOM tree

anthonychu00 commented 4 years ago

Yeah and we can just extract the tags from that to get links. The A* algorithm is straightforward to implement serially if we are using that. I'm just trying to figure out how exactly we're grading the relevancy of a page like it says in the paper.

I don't assume we're iterating more than once to search for lyrics right? If so we can probably ignore the section on Adaptive A*.