JuanjoSalvador / NyaaPy

Unofficial Python wrapper for Nyaa anime torrent sites
MIT License
47 stars 23 forks source link

Replace BeautifulSoup by lxml #44

Closed JuanjoSalvador closed 3 years ago

JuanjoSalvador commented 4 years ago

lxml can scrap the web faster, for next release should be better to use

rpartha commented 4 years ago

Can I call dibs on this one?

JuanjoSalvador commented 4 years ago

Go for it.

husudosu commented 4 years ago

It's been a while, how are you doing @rpartha ? I've some LXML experience, so I can help with the change.

JuanjoSalvador commented 4 years ago

Actually I didn't get any related PR, so... I'm going to rewrite one of the modules and test the perfomance. If you want, take one and do the same.

husudosu commented 4 years ago

Just leaving some results for today: I need to do some performance testing, but I'm done with the parse_nyaa function which used by last_uploads function (did not forget about RSS). It seems okay. On the next few days I'm gonna work on sukebei and nyaa parser and do some performance testing. My development branch is online, when I'm fully done, gonna comment here.

JuanjoSalvador commented 4 years ago

Thank you @husudosu!

husudosu commented 4 years ago

I'm almost done with of the Nyaa.si parser (Except user view method) you can check out my fork (dev branch). (For next commit gonna delete some of my testing stuff) I did a few tests using stable BS4 and my LXML code and seems a little faster, or it's a placebo effect (Don't have time to write proper tests) I've lot work for the rest of the work-week so gonna take some "break" and continue with the rest of nyaa.si parser and sukebei. Until then please provide some feedback to me, if you have time for testing out things.

husudosu commented 4 years ago

So I'm done with Nyaa.si parser (Sukebei not done yet). I made some tests. LXML performing better with some tasks, but not significantly. Most results depending on my connection latency & speed. I've tried to download some pages from nyaa (for proper testing), but by downloading some important elements are missing. If you also want to do some testing, you can simply copy my test.py into stable branch and run it.

With LXML (My dev branch):

Latest torrents time: 429.338 msec Test search time: 503.462 msec
Single torrent time: 417.587 msec Single user time: 642.431 msec

Latest torrents time: 542.411 msec Test search time: 494.892 msec Single torrent time: 501.887 msec Single user time: 671.183 msec

Latest torrents time: 427.618 msec Test search time: 607.589 msec
Single torrent time: 343.875 msec Single user time: 624.98 msec

With BS4 (with stable branch): Latest torrents time: 496.772 msec Test search time: 668.522 msec
Single torrent time: 340.931 msec Single user time: 632.969 msec

Latest torrents time: 440.234 msec Test search time: 506.294 msec Single torrent time: 391.168 msec Single user time: 562.279 msec

Latest torrents time: 423.312 msec Test search time: 656.44 msec Single torrent time: 313.093 msec Single user time: 651.77 msec

Tracert to Nyaa.si (185.178.208.182) first 12: 1 <1 ms <1 ms <1 ms LOCAL_GATEWAY 2 2 ms 3 ms 1 ms MY_ISP_GATEWAY1 3 10 ms 5 ms 6 ms MY_ISP_GATEWAY2 4 7 ms 8 ms 7 ms MY_ISP_GATEWAY3 5 9 ms 8 ms 11 ms MY_ISP_GATEWAY4 6 9 ms 9 ms 9 ms gw.deninet.hu [212.92.23.1] 7 12 ms 10 ms 12 ms te-1-223.deninet.hu [217.113.61.223] 8 24 ms 26 ms 25 ms bpt-b4-link.telia.net [62.115.39.121] 9 38 ms 34 ms 34 ms win-bb2-link.telia.net [80.91.250.64] 10 39 ms 36 ms 39 ms hbg-bb4-link.telia.net [62.115.119.50] 11 48 ms 42 ms 41 ms adm-bb4-link.telia.net [80.91.246.200] 12 34 ms 34 ms 35 ms adm-b1-link.telia.net [62.115.137.65]

ICMP disabled for Nyaa.si

My speedtest results: 30M/30M with 12 msec ping

What do you think? Should I continue the work on other modules?

JuanjoSalvador commented 4 years ago

Sure! Please, go on!

husudosu commented 4 years ago

I'm done with the LXML implementation. Made a pull request

51