dawsbot / eth-labels

📃 A public dataset of crypto addresses labeled
https://eth-labels-production.up.railway.app/swagger
MIT License
190 stars 29 forks source link

[Question] Contributing to label / scraping method #11

Closed brianleect closed 3 months ago

brianleect commented 2 years ago

Came across the repo while requiring address label data awhile back and noticed it only covered a specific subset of label data from etherscan and scraping needed a separate tamper monkey script for each label.

Due to needing other label data not covered, I ended up making a more generalized scraper for etherscan over at https://github.com/brianleect/etherscan-labels

Would love to know how I could contribute back to this repo to populate it with more label information and perhaps also the more generalized scraping method I utilized.

dawsbot commented 2 years ago

Wow @brianleect, this is a significant contribution you've made to open-source! Clearly, I'd love to join forces and have a single repo with all the labels. Whether that's you, me, or us, I have no preference so long as the library is easy to consume in node.js and JavaScript.

I saw in a quick glance that you implemented the scraper in python. Are you familiar with the JS ecosystem too?

brianleect commented 2 years ago

Thanks for responding @dawsbot !

I used Selenium Python for the scraping due to having used it prior. Just realized there was Selenium JS available as well. I'm familiar enough with JS and should be trivial to rewrite it.

Regarding labels

Would love to know what you think about it.

dawsbot commented 2 years ago

Rewriting in JS would be my goal here, but if that's a hassle, let's address that upfront.

I think the massive lists (80-90k addresses) is fine so long as we optimize the bundle output for JS. I'm happy to tag-team on this, but given my current work-load elsewhere (high), I've got ideas how to collab on this. Discord me at daws.eth# TWO FIVE SIX TWO 🙏

brianleect commented 2 years ago

Sent you a friend request on discord. I'm transfixed#0001.

brianleect commented 2 years ago

Rewrote the login and partial scraping format in selenium JS

https://github.com/brianleect/evm-labels/blob/master/scripts/scrape-all.js

Not too sure what is the javascript equivalent of pandas.read_html to retrieve table though.

dawsbot commented 2 years ago

Nice @brianleect ! I'm excited to join forces 🙌