Malware detection via (something like) Google Safe Browsing

filecoin-saturn / L1-node

Filecoin Saturn L1 Node • The edge cache layer of Filecoin's decentralized CDN 🪐

Other

144 stars 50 forks source link

Malware detection via (something like) Google Safe Browsing #478

Open vorburger opened 1 year ago

vorburger commented 1 year ago

In #477 I have noticed that apparently some ISPs (mine) block (some URLs related to) Saturn, now.

I (vaguely) remember that there was a blocklist of bad CIDs somewhere, but apparently it's not enough for overzealous ISP.

Does this project have any plans to add more filtering to weed out CIDs hosting e.g. phishing content?

It occured to me that (something like) Google Safe Browsing https://developers.google.com/safe-browsing/ (which has a publicly availabe API) could be potentially be of interest to integrate.

Just a thought and for discussion.

willscott commented 1 year ago

there is no user-generated content served from 'orchestrator.strn.pl'. It is only used for L1 registration / communications. As such, something like safe browsing would not help with an overzealous ISP blocking of the form you have encountered.

vorburger commented 1 year ago

@willscott right... I see. Let's forget what specifically got me thinking about and triggered me to create this issue initially.

More generally, would Saturn benefit from some form of more "dynamic" CID blocking for malware detecting than only the current denylist.conf?

If there is interest, I'm not sure how one would go about this... completely dynamic may slow down retrieval? But you could image doing some offline batch analysis to regularly update a block list?

Perhaps this issue could serve for discussion of this more general underlying idea.

willscott commented 1 year ago

There are already a couple factors that help mitigate the abuse potential of an L1 node.

As an initial dynamic mitigation, L1s do not serve content from the combined/community-maintained https://badbits.dwebops.pub/

In addition, L1s do not serve content directly. This should both limit the risk of getting flagged for serving e.g. malware or phishing content, but is also done for client protection - getting content from an L1 requires content software that verifies that the data received matches the CID that was requested.