UTMediaCAT / mediacat-domain-crawler

Internet domain crawler
0 stars 0 forks source link

#2 article plaintext #12

Closed RaiyanRahman closed 3 years ago

RaiyanRahman commented 3 years ago

The domain crawler should be ready for a test run.

Get article information using readability. Added JSON formatting for output. Added dynamic pseudoURLs for multiple websites. Changed number of links crawled to 20. The domains are picked up now with different permutations of links. The found URLs subilst has been modified to be more verbose.

Need to revise after merge: Integrate Alex's twitter links handling.