dreisman / WebCensusNotebook

4 stars 1 forks source link

Querying for a particular third-party URI is slow, if it's found on a lot of first parties #13

Closed dreisman closed 7 years ago

dreisman commented 7 years ago

Getting to third-party domain to a list of first parties is now very fast. But if you want to find first parties with a particular third-party URI, that process can be slow if the third-party URI might be found on a lot of websites.

Example cen.third_parties['facebook.net'] is near-instantaneous now, but if you want all first parties that specifically have 'facebook.net/sdk/en_us/sdk.js', you have to do something like: first_parties_w_uri = [] for fp, obj in cen.third_parties['facebook.net'].first_parties: if 'facebook.net/sdk/en_us/sdk.js' in [url for url in obj.third_party_resources]: first_parties_w_uri.append(fp)

The slowness comes in querying potentially many thousands of first parties and checking for the URI you're interested in. We can make this sort of important query faster too.

dreisman commented 7 years ago

Added ThirdParty.all_resources