csce585-mlsystems / Phishing-Detection

0 stars 0 forks source link

Project Proposal Feedback #2

Open pooyanjamshidi opened 1 month ago

pooyanjamshidi commented 1 month ago

Feedback on Phishing Detection Proposal:

First off, nice job laying out the problem and your solution. Phishing is a huge and growing issue, so the need for real-time detection is very clear. Your approach using a multimodal model and focusing on real-time classification is practical and timely. Below are some of my thoughts and suggestions that could help strengthen your proposal:

Strengths:

Suggestions:

  1. Active Prevention vs. Detection:

    • You mention in the feedback section that you’re thinking about allowing the model to actively prevent users from clicking on phishing links. This is a great idea, but implementing that feature adds complexity. You’ll need to think about how this would work within the email application—would it block the link entirely, give a warning, or just flag the email? Make sure to balance between user experience and security—users might get frustrated if they feel too restricted.
  2. Accuracy Thresholds:

    • You’ve mentioned the accuracy thresholds for LSTM (96.9%), Bi-LSTM (99%), and GRU (97.5%), which is helpful. But for real-time systems, you’ll also want to define some operational thresholds beyond accuracy. For example, how tolerant can the system be of false positives and false negatives? Users won’t be happy if they lose legitimate emails, and the cost of false negatives (missing a phishing email) is also high. Consider balancing accuracy with precision/recall to minimize the trade-offs.
  3. Comparison to Non-LLM Solutions:

    • It’s great that you’re thinking about comparing your model with non-LLM solutions. Make sure to explain what these non-LLM solutions are and how your approach improves upon them. It might be worth exploring other simpler but effective techniques like rule-based systems, regular expressions, or other classical machine learning methods like decision trees or SVMs for comparison.
  4. Handling Concept Drift:

    • Phishing techniques evolve quickly, and what works today might not work tomorrow. You may want to mention how your model will handle this, especially since phishing URLs and techniques change. A section on how you plan to keep the model up-to-date (e.g., continuous learning or periodic re-training) would strengthen your proposal.
  5. User Experience and Integration:

    • You’ve mentioned integrating the model into an email platform, which is great. However, you might want to elaborate a bit more on how this will work from a user’s perspective. Will users get an alert when an email is flagged? Can they override the system if they think it’s wrong? The smoother the integration, the better the user experience.
  6. Real-Time Evaluation:

    • Your idea of testing the model on real-time email streams is excellent. I’d suggest expanding a bit on how you’ll set up this testing environment. Are you going to simulate live email traffic or use a sandbox environment? If possible, plan to show how well the model scales under load (number of emails per second it can process) and its latency.

Additional Considerations:

Conclusion:

Overall, this is a strong proposal. You’ve identified a real-world problem, chosen a practical approach (focusing on GRU for low latency), and thought about solid evaluation metrics. Just make sure to dive deeper into the user experience and how you’ll handle evolving phishing techniques (concept drift). Also, think about balancing detection with prevention and providing clear explanations of how your model improves upon existing solutions.

Looking forward to seeing how this evolves!