For this project, your main challenge is improving phishing detection by developing a real-time, multimodal system based on transformers and other features like URLs and metadata.
Experiment Design:
Dataset: Start by fine-tuning a pre-trained transformer (e.g., BERT, GPT) on datasets such as SpamAssassin or PhishTank.
Model: Focus on testing the GRU model from the paper “Multimodel Phishing URL Detection” for real-time classification, as it has lower latency.
Model Enhancements: Experiment with combining text-based embeddings with URL and metadata features. Measure how well the multimodal model improves phishing classification accuracy.
Motivation Example:
Present a plot comparing phishing classification accuracy and latency between your GRU-based multimodal model and baseline models like BERT or GPT-2. This will demonstrate whether your system can handle real-time detection more efficiently than existing offline solutions.
Evaluation Focus:
Metrics: Use accuracy metrics like the F1 score and precision-recall curves. Compare these metrics against the thresholds from the existing paper (LSTM: 96.9%, Bi-LSTM: 99%, GRU: 97.5%).
Real-time testing: Show how well the model performs on live email streams and evaluate its speed (latency).
Specific Task:
For this project, your main challenge is improving phishing detection by developing a real-time, multimodal system based on transformers and other features like URLs and metadata.
Experiment Design:
Motivation Example:
Present a plot comparing phishing classification accuracy and latency between your GRU-based multimodal model and baseline models like BERT or GPT-2. This will demonstrate whether your system can handle real-time detection more efficiently than existing offline solutions.
Evaluation Focus: