Enforcement bots TODO list

Bender250 commented 4 years ago

Implementation
- [x] Mailserver for generation of unique addresses
- [x] Fix issue with dropping malformed emails
- [x] Crawler
- [x] Language detection + ENG version detection
- [x] Detection registration forms, terms and conditions, privacy policies (continuous advancement process)
- [x] Orchestration (user guides the crawler)
- [x] Run crawler to detect 1000 potential registration forms
- [x] Algorithm for extraction of registration forms features
- [ ] Registration form classification (dependent on Training reg. forms dataset collection)
- [ ] Data pre-processing: cleaning, language features embedding
- [ ] Modeling
- [ ] Using the output of classification in the crawler (this connection is more challenging than it seams)
- [ ] Email classification (dependent on collection of Training reg. forms dataset and Mail labeling)
- [ ] Features analysis and extraction
- [ ] Classification
- [x] Email registration confirmation
- [x] Finish registration process (e.g., clicking confirmation links, using registration code)
Study
- [x] Pilot study
- [x] What aspects are interesting?
- [ ] Training registration forms dataset collection (depends on Crawler orchestration and Running crawler)
- [ ] Can we collect 1000 registration forms?
- [x] Processing corresponding emails
- [ ] Final study
- [ ] In ideal case, we can find all types of violations automatically. Then this study analyses a sample to confirm rate of false positives and false negatives
- [ ] If the automation is not that successful, we have to use the orchestration. Can we do 10k registrations?
Writing
- [ ] Analyze the following research questions:
- [x] Are email addresses shared with third parties?
- [ ] Where do the spammers get the email addresses?
- [ ] What ratio of services sends unsolicited mail? Are they smaller or larger companies?
- [ ] What services force user to accept newsletters? Are they smaller or larger companies?
- [ ] Are the registration forms themselves compliant (pre-accepted T&C/PP)?
- [ ] Do the "Unsubscribe" links work?

Bender250 commented 4 years ago

3 law papers:

Lay of the land (requirement: data collection tool, either automatic or with RAs)
- What types of violations exist?
- How are different EU countries interpreting them? Or difference of DE law and EU law.
- Are there differences among countries / industries in types of violations?
Class actions (requirement: plugin for users that detects violations and helps with reporting them to regulatory entities)
Doctrinal contribution

Bender250 commented 4 years ago

Detailed TODO list for the implementation.

[x] Autofill form fields
[ ] Detect marketing email consent in the terms and conditions
[ ] Integrate the ML model
[x] Confirming successful registration
- [x] On the website
- [x] By the email
[x] Solving captcha
- [x] Detect catpcha
- [x] Determine captcha type
- [x] AZCaptcha
- [x] reCaptcha v2
- [x] reCaptcha v3
- [x] Traditional Captcha support
[x] Scaling things up
- [x] running headless
- [x] parallel
[ ] Report generation
- [ ] detect email address to inform website owners about the report

Optional

[x] Support of other languages.

Bender250 / eth_knowledge_base

Enforcement bots TODO list #1