Open MrLuit opened 6 years ago
I'm very much afraid that you might have done some of the wrong design decision skipping some design point when transforming the dataset into nosql.
1) URI confusion Entities described with URI are roughly categorized as a whole or with a part into root domain, sub domain and URL. I have not seen clear distinction of them in your statement.
2) Previous ids can be and have to be preserved and shown optionally.
3) All old format data have to be preserved and accessed through a format overlay to match new format, so that chronology can be preserved.
Assumption
1) The ESDB data set is going to be a kind of vulnerability activity oracle on Ethereum address
2) A source for Virustotal URI scanner and other aggregated blacklisting services
3) Aggregated liability per Eth address will be the most valuable data for the whole dataset in the near future considering affordable smart contract use case at social layer level (above layer 2)
If enough time and resources are available, those require data modeling by UML to validate.
Are these make sense?
I'm currently using current API to get full 5K lines full dataset into google sheet with importjson about once in a day.
Here's a data entry attribute I've been waiting for.
somethingbad.tumblr.com ==> sub bad, root good sub.badwallet.com ==> sub bad, root bad
I need an additional distinctive attribute for the above two.
Also how can I identify the scope of black below?
somedomain.com/aaaa/bad domain good/bad, upper directory good/bad
First of all, your opinion on this matter is highly appreciated. We will not be pushing these changes to production until we are sure they work out for everyone involved on the project.
To address your concerns about URI / URL classification, I've been thinking about this as well. I propose we work towards the following scam entry structure:
-
url: https://malicious.example/scam.php
scheme: *://malicious.example/*
category: Phishing
subcategory: MyCrypto
description: Malicious website
addresses:
- 0x0
- 0x1
These changes will also be reflected in the UI
While I understand compatibility is important, I think some entries still showing the id
property while others do not is a bit confusing. Also, chronology will still be preserved through the natural order of the scams.yaml file (which is the same order as the ascending id
)
We will indeed also be providing more integration data (like VirusTotal) through the API in the future.
Please let me know what you think :smile:
Good! Let me add some insights. I look at current entry/record as
a) Scam/phishing evidence
b) filtering template to detect malicious activities
As evidence I want some snapshot record, like URLscan or phishcheck
in case of
ppp.ttt -> nothing
ppp.ttt/eth -> scam page
showing only ttt.ppp does not sastify the needs. We may either having addtional link to the evidence, or specify that in URI like latter example
I'm now locally keep them or search above.
For b) I can think of 3 types of templates at least
1) root domain whole contents of the ttt.ppp root domain 
2) sub domain
sss.ttt.ppp
They are rotating URI/contents, and try to escape the filter with staging deployment.
I guess you now assuming that for sub-domain only, but it's a security hole they already targeting in their staging disgusting deployment.
like
initial:
ttt.ppp - nothing
sss.ttt.ppp - scam page
later: sss.ttt.ppp - gone ttt.ppp - scam
In that case, when registering sss.ttt.ppp, we must mark if ttt.ppp is good or bad, to distinguish those from tumbler/blogger for example.
3) ttt.ppp/sdjrhf/djrj
Google doc/telegram/dropbox etc
Those are probably fundamental requirements from perspective view to the dataset, desirably comply at any point of expansion, but faster is better to avoid big modification or rewrite.
For UI, ESDB is a kind of professional tool on purpose, and I recommend you not to stick on simplicity and entertainment factor that some of outside may aware.
I will update my PR next week using your feedback, thanks for being involved :smile:
https://medium.com/@etherscamdb/breaking-api-changes-in-v3-646217a22bac