aws-solutions-library-samples / guidance-for-digital-assets-on-aws

Digital Assets Examples
MIT No Attribution
45 stars 12 forks source link

AWS Public Blockchain Data is incorrect at multiple days, contains junk data #5

Closed sarkiano closed 9 months ago

sarkiano commented 1 year ago

S3 bucket (aws-public-blockchain) with ethereum blockchain data has this issue - data count is incorrect at multiple days. You can see that folder "s3://aws-public-blockchain/v1.0/eth/logs/date=2023-03-14/" contains files with two different upload time. There are also other folders containing junk data. Querying data from that day can produce misleading information. As I understand this error occurred when data receiving process temporarily stopped at 2023-03-14 and restarted afterwards.

These are my two main requests:

  1. Eliminate junk data so that clean data can be used for some purposes.
  2. "value" column in token_transfers table has double as type. But original data coming from blockchain comes in uint256, so we lose precision. Values can be really big up to 115792089237316195423570985008687907853269984665640564039457584007913129639935. So I suggest providing string type to "value" column in parquet files.
  3. Is there a way to increase update rate of data? As I understand data in that bucket gets updated once every day. It would be better to get data updated at more frequent rate.