To speed up the deployment process of the python service, we trasnfered the unused code, models and dataset onto the Google Drive, please go to the link below for reference: https://drive.google.com/file/d/1A-Pzn7J83Ebh5T_5JTcPvXW5SLIDWpTQ/view?usp=sharing
The 'application.py' is the detection process, including both dark pattern detection to dark pattern type classification.
Can be found in the file: "requirements.txt"
(1) Dark Pattern Detection model
, to detect if a line of the content contains dark pattern.
rf_presence_classifier.joblib
is the 5 dark pattern types detection model --- Random Forest Model
presence_TfidfVectorizer.joblib
is the presence countverctorizer for content preprocessing for the 5 Pattern Types.
confirm_rf_clf.joblib
is the Confirmshaming dark pattern detection model --- Random Forest Model
confirm_tv.joblib
is the presence countverctorizer for content preprocessing for Confimshaming.
(2) Type Classification model
, to classify the detected dark pattern content into certain dark pattern type.
lr_category_classifier.joblib
is the pattern type classification model ---- Logistic Regression Model
type_CountVectorizer.joblib
is the pattern type countverctorizer for content preprocessing.
run python application.py
/api/parse
eg:
{
"total_counts": {1},
"items_counts": {
"1": 1
},
"details": [
{
"content": "LAST CHANCE ITEMS",
"tag": "div/div/p/div",
"tag_type": "link",
"key": "AKIAX4DWMQK2GILTOxx",
"type_name": "FakeCountdown",
"type_name_slug": "Fake Countdown"
}
]
}
Prerequisite & dependencies
Python 3.9
Pip