brianleect / etherscan-labels

Full label data dump of top EVM chains in JSON/CSV.
MIT License
249 stars 73 forks source link

[Research][QOL] JSON/CSV vs Single file soln #3

Open brianleect opened 1 year ago

brianleect commented 1 year ago

Currently we have each run of getLabels to be saved as {labelName}.json {labelName}.csv

Might be a major QOL improvement to consolidate into a single file?

address:{'name':NAMETAG, 'label':LABEL}

Might need to take a look into the size of all label information and whether it makes sense to leave it in a single large JSON file or into a DB (SQlite perhaps?)

brianleect commented 1 year ago

image

Current size is only 4-5mb. Not sure if it makes sense to use a DB due to the additional overhead/complexity added.

Considering 4-5mb is JSON + CSV, the consolidated file themselves is likely only ~2.5mb.

brianleect commented 1 year ago

Current single file consolidated JSON is located https://github.com/brianleect/etherscan-labels/blob/main/data/combinedLabels.json

2.7mb space taken. Don't think it justifies usage of DBMS since we can simply load it into memory once and get object of interest. Since we are using address:nameTags format retrievals should be O(1) as well.