carissalow / rapids

Reproducible Analysis Pipeline for Data Streams
http://www.rapids.science/
GNU Affero General Public License v3.0
36 stars 20 forks source link

Exclude certain apps when scraping/updating missing categories. #132

Closed yiyir closed 3 years ago

yiyir commented 3 years ago

Is your feature request related to a problem? Please describe. If the app cannot be found by google, then en error will be thrown(causing the program to exit), even if we already excluded the app in EXCLUDED_APPS config; this problem forces us to set both UPDATE_CATALOGUE_FILE and SCRAPE_MISSING_CATEGORIES to be false despite that we want to auto-scrape most of the unidentified apps but just skip certain ones...

Describe the solution you'd like Exclude apps in EXCLUDED_APPS from scraping, since we are not considering them for feature extraction.

Additional context In the example config below, the app "com.upmc.rosa" is a potential app that makes the whole program exit.

PHONE_APPLICATIONS_FOREGROUND: CONTAINER: applications_foreground APPLICATION_CATEGORIES: CATALOGUE_SOURCE: FILE CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv" UPDATE_CATALOGUE_FILE: TRUE SCRAPE_MISSING_CATEGORIES: TRUE PROVIDERS: RAPIDS: COMPUTE: TRUE SINGLE_CATEGORIES: ["all", "email"] MULTIPLE_CATEGORIES: social: ["socialnetworks", "socialmediatools"] entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"] SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] EXCLUDED_CATEGORIES: [] EXCLUDED_APPS: ["com.upmc.rosa"] FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"] SRC_SCRIPT: src/features/phone_applications_foreground/rapids/main.py

JulioV commented 3 years ago

Thanks for reporting this @yiyir , what commit are you using? git rev-parse --short HEAD

yiyir commented 3 years ago

Thanks for reporting this @yiyir , what commit are you using? git rev-parse --short HEAD

@JulioV 00a3335

JulioV commented 3 years ago

This was a bug, the fix is now in 29cc3f00e9826def890758d13726d8df0caed633 v1.1.1. We still scrape all apps but correctly handle the case when they don't exist. Thanks again for reporting