ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.35k stars 134 forks source link

grab-site benchmark with cPython 3.4.5 vs PyPy3 5.5.0 on Ubuntu 16.04.1 #96

Closed ivan closed 7 years ago

ivan commented 7 years ago

grab-site/wpull have always used a lot of CPU time, so I periodically benchmark it with PyPy3 to see if PyPy3 can offer a big advantage here. Unfortunately, the answer is probably still "no":

I launched a grab-site on https://www.reddit.com/r/programming/ with both cPython 3.4.5 and PyPy3 5.5.0 (note: no cchardet), and after about 15 minutes, both grabbed roughly the same number of pages (+4% for PyPy). PyPy used 10% more CPU time and 4.4x as much resident memory (388MB vs 89MB).