Open JeremyBarbosa opened 4 years ago
It seems that based on http://people.cse.nitc.ac.in/sites/default/files/aviralnigam/files/web_crawling_algorithms.pdf A* Search is the best.
https://stackoverflow.com/questions/35042798/get-dom-tree-with-javascript
This seems like a good resource in configuring algorithm for extracting DOM tree
Yeah and we can just extract the tags from that to get links. The A* algorithm is straightforward to implement serially if we are using that. I'm just trying to figure out how exactly we're grading the relevancy of a page like it says in the paper.
I don't assume we're iterating more than once to search for lyrics right? If so we can probably ignore the section on Adaptive A*.
Requirements
Defintion of Done Provide a white paper describing the algorithm that we will implement