everywall / ladder

Selfhosted alternative to 12ft.io. and 1ft.io bypass paywalls with a proxy ladder and remove CORS headers from any URL
GNU General Public License v3.0
4.41k stars 175 forks source link

Improvement: add benchmarking and testing suite #46

Closed deoxykev closed 10 months ago

deoxykev commented 10 months ago

Eventually, we’ll want to test on realistic data for benchmarking and finding edge cases in the code. I’m thinking we could use a warc player and the common crawl C4 database (realnewslike) which is about 34GB.

https://github.com/allenai/allennlp/discussions/5056

mms-gianni commented 10 months ago

Interesting idea. But I'm not sure if this is overkill. And they probably won't include CSS and JavaScript, which may contain linked assets too.

deoxykev commented 10 months ago

Probably better to do atomic, focused unit testing actually. So many edge cases.