Alhajras / webscraper

Configurable search engine written in Python and Angular. It supports indexing as well.
1 stars 0 forks source link

Documentation part 2 #13

Open Alhajras opened 1 year ago

Alhajras commented 1 year ago

12.06.2023

11.06.2023

10.06.2023

09.06.2023

Thread: 2 Docs: 816 Visited Links: 0 total_non_useful_links: 0 2622.433438539505 {113043: 4660, 113042: 3852}

Thread: 1 Docs: 817 Visited Links: 0 total_non_useful_links: 0 3380.6824486255646 {93606: 6550}

Thread: 4

Docs: 800 Visited Links: 0 total_non_useful_links: 0 2338.493413209915 -> 39min {63675: 2024, 63677: 2302, 63673: 2514, 63676: 1248}

Thread: 4

Thread: 4 Docs: 103 Visited Links: 0 total_non_useful_links: 0 198.45312595367432 {60160: 248, 60157: 256, 60161: 224, 60159: 152}

Thread: 2 Level is used with FIFO DSC Docs: 101 Visited Links: 0 total_non_useful_links: 0 223.89409279823303 {50977: 304, 50975: 504}

Level is used with FIFO DSC Docs: 100 Visited Links: 0 total_non_useful_links: 0 456.75313687324524 {35956: 464, 35955: 336}

08.06.2023

05.06.2023

More testings: ------------------------ After adding the bulk of the document saving ---------

first try: Thread: 1 Visited Links: 137 total_non_useful_links: 628 420.3422930240631 {94184: 1986}

Note: the bulk made the algorithm slower

------------------------ Before adding the bulk of the document saving ---------

Change the urls 2 Threads only First run Visited Links: 126 total_non_useful_links: 470 215.15883469581604 -> 3.5m {69885: 344, 69884: 464}

Second run: Visited Links: 126 total_non_useful_links: 473 225.6277961730957 -> 3.75 {72023: 344, 72022: 456}

2 Threads only First run Visited Links: 135 total_non_useful_links: 1453 306.88921093940735 -> 5.1 {65029: 584, 65030: 216}

Second run: Visited Links: 141 total_non_useful_links: 1523 295.5270366668701 -> {67435: 216, 67434: 584}

2 Threads only Visited Links: 136 total_non_useful_links: 471 283.0769159793854 {61601: 456, 61602: 352}

25.05.2023