freedomofpress / fingerprint-securedrop

A machine learning data analysis pipeline for analyzing website fingerprinting attacks and defenses.
GNU Affero General Public License v3.0
29 stars 9 forks source link

Implement our own page load timeout function for the crawler #87

Open psivesely opened 8 years ago

psivesely commented 8 years ago

Selenium's page load timeout function is highly unreliable. If it doesn't close down a connection within 5s of when it's supposed to, we should stop a crawl by whatever means necessary (probably closing all circuits will be sufficient, but we already have a method for restarting TB if we need to). This will stop the crawler from wasting time getting stuck on these sites which load for minutes at a time. See fpsd/tests/test_sketchy_sites.py for some good example sites/ a good test case for this timeout function.

psivesely commented 7 years ago

I'm not sure the last time the features were computed (I can't SSH into the VPSs right now for some reason--probably IPTables?), but anyway it seems like this is definitely needed:

fpsd=> select * from features.cell_timings order by total_elapsed_time desc limit 3;
 exampleid | total_elapsed_time
-----------+--------------------
      1106 |         930.656736
      7567 |         449.786331
      1387 |         441.871928
(3 rows)