MrLuit / EtherScamDB

Keep track of all current ethereum scams in a large database
MIT License
168 stars 77 forks source link

Announcement - Preparing for v3.0.0 (Breaking changes) #1247

Open MrLuit opened 6 years ago

MrLuit commented 6 years ago

https://medium.com/@etherscamdb/breaking-api-changes-in-v3-646217a22bac

sekisanchi commented 6 years ago

I'm very much afraid that you might have done some of the wrong design decision skipping some design point when transforming the dataset into nosql.

1) URI confusion Entities described with URI are roughly categorized as a whole or with a part into root domain, sub domain and URL. I have not seen clear distinction of them in your statement.

2) Previous ids can be and have to be preserved and shown optionally.

3) All old format data have to be preserved and accessed through a format overlay to match new format, so that chronology can be preserved.

Assumption

1) The ESDB data set is going to be a kind of vulnerability activity oracle on Ethereum address

2) A source for Virustotal URI scanner and other aggregated blacklisting services

3) Aggregated liability per Eth address will be the most valuable data for the whole dataset in the near future considering affordable smart contract use case at social layer level (above layer 2)

If enough time and resources are available, those require data modeling by UML to validate.

Are these make sense?

I'm currently using current API to get full 5K lines full dataset into google sheet with importjson about once in a day.

sekisanchi commented 6 years ago

Here's a data entry attribute I've been waiting for.

somethingbad.tumblr.com ==> sub bad, root good sub.badwallet.com ==> sub bad, root bad

I need an additional distinctive attribute for the above two.

Also how can I identify the scope of black below?
somedomain.com/aaaa/bad domain good/bad, upper directory good/bad

MrLuit commented 6 years ago

First of all, your opinion on this matter is highly appreciated. We will not be pushing these changes to production until we are sure they work out for everyone involved on the project.

To address your concerns about URI / URL classification, I've been thinking about this as well. I propose we work towards the following scam entry structure:

- 
  url: https://malicious.example/scam.php
  scheme: *://malicious.example/*
  category: Phishing
  subcategory: MyCrypto
  description: Malicious website
  addresses:
    - 0x0
    - 0x1

These changes will also be reflected in the UI

While I understand compatibility is important, I think some entries still showing the id property while others do not is a bit confusing. Also, chronology will still be preserved through the natural order of the scams.yaml file (which is the same order as the ascending id)

We will indeed also be providing more integration data (like VirusTotal) through the API in the future.

Please let me know what you think :smile:

sekisanchi commented 6 years ago

Good! Let me add some insights. I look at current entry/record as

a) Scam/phishing evidence
b) filtering template to detect malicious activities

As evidence I want some snapshot record, like URLscan or phishcheck
in case of ppp.ttt -> nothing ppp.ttt/eth -> scam page

showing only ttt.ppp does not sastify the needs. We may either having addtional link to the evidence, or specify that in URI like latter example

I'm now locally keep them or search above.

For b) I can think of 3 types of templates at least

1) root domain whole contents of the ttt.ppp root domain 

2) sub domain

sss.ttt.ppp

They are rotating URI/contents, and try to escape the filter with staging deployment.
I guess you now assuming that for sub-domain only, but it's a security hole they already targeting in their staging disgusting deployment. like

initial:
ttt.ppp - nothing sss.ttt.ppp - scam page

later: sss.ttt.ppp - gone ttt.ppp - scam

In that case, when registering sss.ttt.ppp, we must mark if ttt.ppp is good or bad, to distinguish those from tumbler/blogger for example.

3) ttt.ppp/sdjrhf/djrj

Google doc/telegram/dropbox etc

Those are probably fundamental requirements from perspective view to the dataset, desirably comply at any point of expansion, but faster is better to avoid big modification or rewrite.

For UI, ESDB is a kind of professional tool on purpose, and I recommend you not to stick on simplicity and entertainment factor that some of outside may aware.

MrLuit commented 6 years ago

I will update my PR next week using your feedback, thanks for being involved :smile: