TrueBlocks / trueblocks-core

The main repository for the TrueBlocks system
https://trueblocks.io
GNU General Public License v3.0
1.05k stars 198 forks source link

Strange behaviour with staged/unripe blocks. #3855

Open walrusmustgo opened 2 months ago

walrusmustgo commented 2 months ago

I've found an issue -- i reckon it's most definitely an issue, not misusage -- regarding querying staged and unripe blocks. It seems to me that the chifra isn't reading either of the two when querying. I'm going to explain my process, but keep in mind,this's very time sensitive, it's only concerning newer data on the blockchain, so for actually testing it out properly, i reckon we'll have to test it out in similar times.

So, i've got a service which's listening for approvals on tokens which aren't released for trading yet. I'm going to be using this address:

0x1a7cAC64314FB35e69a1045014EEa422D902fB17

At the time of writing, this contract is ~10mins old. When i run:

chifra export 0x1a7cAC64314FB35e69a1045014EEa422D902fB17

i get the following output:

2812a94825aa:/# chifra export 0x1a7cAC64314FB35e69a1045014EEa422D902fB17
EROR[14-08|13:55:05.157] no blocks found for the query

When i run:

chifra export --unripe 0x1a7cAC64314FB35e69a1045014EEa422D902fB17

i get the same output:

2812a94825aa:/# chifra export 0x1a7cAC64314FB35e69a1045014EEa422D902fB17 --unripe
EROR[14-08|14:03:52.839] no blocks found for the query

And using the curl debugger:

2812a94825aa:/# TB_DEBUG_CURL=1 chifra export 0x1a7cAC64314FB35e69a1045014EEa422D902fB17 --unripe
EROR[14-08|14:04:40.626] no blocks found for the query
INFO[14-08|14:04:40.626] Closing curl debugger

Now,this contract has activity on it, the scraper is running properly, the staging file is growing in size, and when i try

grep 0x1a7cAC64314FB35e69a1045014EEa422D902fB17 <filename>

in the staging folder, i don't get anything in the output. This's happening with all new tokens, and it's pretty inconsistent when i'll be able to query them. Sometimes it needs over 4h ( which further points me to thinking that chifra only reads finalized chunks, because it's very possible that a chunk was built during that time ) to actually output anything, and it misses last 1h of data, and this's with the --unripe flag. From my understanding, if i use --unripe, i should basically have access to the chain head, or a block less, until the scraper picks it up, but that doesn't seem to be the case.

Any help would be appreciated :)

PS. This's also true for --traces flag, and chifra list command as well. And checking the unripe and staging folders, the file name seems to be correct as well, 28 blocks behind the head for the staged, and chain head block for the unripe file.

I'm on trueblocks 3.1.2, docker version, and i'm using erigon 3.0 as a node.

walrusmustgo commented 2 months ago

Some more examples, but this time with WETH/USDT pair address instead of new tokens.

a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542500
EROR[16-08|17:19:58.219] no blocks found for the query
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542499
EROR[16-08|17:20:03.713] no blocks found for the query
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542489
blockNumber     transactionIndex        timestamp       date    from    to      value   gasPrice        gasUsed gasCost hash    isError encoding        type    ether
20542491        13      1723826975      2024-08-16 16:49:35 UTC 0x042a77a27d7677465d689fcf3161f685bd363f50      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       5589343442      255248  1426668734883616        0x48e3d94ff93e54b2e5a39d4e9fb72f65a29ad116464091e9d20ae22f99eb147e      false   0x3593564c      0x2     0
20542491        63      1723826975      2024-08-16 16:49:35 UTC 0xd51983c46e4c993dfe7063ded181da03d8cd2de8      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       4127695726      219151  904588646048626 0x58fb989e7b95914eb93c6af42b9ceef14d192127dea0dab7c0ea96eb4e89c670      false   0x3593564c      0x2     0
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542492
EROR[16-08|17:20:13.123] no blocks found for the query
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542491
blockNumber     transactionIndex        timestamp       date    from    to      value   gasPrice        gasUsed gasCost hash    isError encoding        type    ether
20542491        13      1723826975      2024-08-16 16:49:35 UTC 0x042a77a27d7677465d689fcf3161f685bd363f50      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       5589343442      255248  1426668734883616        0x48e3d94ff93e54b2e5a39d4e9fb72f65a29ad116464091e9d20ae22f99eb147e      false   0x3593564c      0x2     0
20542491        63      1723826975      2024-08-16 16:49:35 UTC 0xd51983c46e4c993dfe7063ded181da03d8cd2de8      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       4127695726      219151  904588646048626 0x58fb989e7b95914eb93c6af42b9ceef14d192127dea0dab7c0ea96eb4e89c670      false   0x3593564c      0x2     0
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542492
EROR[16-08|17:20:21.349] no blocks found for the query
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542492
EROR[16-08|17:21:35.761] no blocks found for the query
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542491
blockNumber     transactionIndex        timestamp       date    from    to      value   gasPrice        gasUsed gasCost hash    isError encoding        type    ether
20542491        13      1723826975      2024-08-16 16:49:35 UTC 0x042a77a27d7677465d689fcf3161f685bd363f50      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       5589343442      255248  1426668734883616        0x48e3d94ff93e54b2e5a39d4e9fb72f65a29ad116464091e9d20ae22f99eb147e      false   0x3593564c      0x2     0
20542491        63      1723826975      2024-08-16 16:49:35 UTC 0xd51983c46e4c993dfe7063ded181da03d8cd2de8      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       4127695726      219151  904588646048626 0x58fb989e7b95914eb93c6af42b9ceef14d192127dea0dab7c0ea96eb4e89c670      false   0x3593564c      0x2     0

Here,i've been pinpointing exactly where the no blocks found error occurs. At the time of writing, the current block number is 20542666, and it fails after 20542491.

Here are some status logs if they can help:

a900bdce7c5a:/# chifra status index
{ROG[16-08|17:23:19.698]                                            
  "data": [
    {
      "cachePath": "/cache/mainnet/",
      "caches": [
        {
          "items": [],
          "lastCached": "2024-08-16 17:23:19",
          "nFiles": 4851,
          "nFolders": 1,
          "path": "/unchained/mainnet/finalized/",
          "sizeInBytes": 128196093084,
          "type": "indexCache"
        }
      ],
      "chainConfig": "/root/.local/share/trueblocks/config/mainnet/",
      "clientVersion": "erigon/3.00.0/linux-amd64/go1.21.5",
      "hasEsKey": true,
      "hasPinKey": false,
      "indexPath": "/unchained/mainnet/",
      "isApi": false,
      "isArchive": true,
      "isTesting": false,
      "isTracing": true,
      "rootConfig": "/root/.local/share/trueblocks/",
      "rpcProvider": "http://erigon:8547",
      "trueblocksVersion": "GHC-TrueBlocks//3.1.2-release"
    }
  ]
}
a900bdce7c5a:/# chifra status blocks
{ROG[16-08|17:23:54.556]                                            
  "data": [],
  "errors": [
    "no files were found in the [blocks] caches"
  ]
}
a900bdce7c5a:/# chifra status unripe
{ROG[16-08|17:24:16.126]                                            
  "data": [
    {
      "cachePath": "/cache/mainnet/",
      "caches": [
        {
          "items": [],
          "lastCached": "2024-08-16 17:24:16",
          "nFiles": 28,
          "nFolders": 1,
          "path": "/unchained/mainnet/unripe/",
          "sizeInBytes": 1537127,
          "type": "unripeCache"
        }
      ],
      "chainConfig": "/root/.local/share/trueblocks/config/mainnet/",
      "clientVersion": "erigon/3.00.0/linux-amd64/go1.21.5",
      "hasEsKey": true,
      "hasPinKey": false,
      "indexPath": "/unchained/mainnet/",
      "isApi": false,
      "isArchive": true,
      "isTesting": false,
      "isTracing": true,
      "rootConfig": "/root/.local/share/trueblocks/",
      "rpcProvider": "http://erigon:8547",
      "trueblocksVersion": "GHC-TrueBlocks//3.1.2-release"
    }
  ]
}
a900bdce7c5a:/# chifra status staging
{ROG[16-08|17:24:29.377]                                            
  "data": [
    {
      "cachePath": "/cache/mainnet/",
      "caches": [
        {
          "items": [],
          "lastCached": "2024-08-16 17:24:29",
          "nFiles": 1,
          "nFolders": 1,
          "path": "/unchained/mainnet/staging/",
          "sizeInBytes": 7378599,
          "type": "stagingCache"
        }
      ],
      "chainConfig": "/root/.local/share/trueblocks/config/mainnet/",
      "clientVersion": "erigon/3.00.0/linux-amd64/go1.21.5",
      "hasEsKey": true,
      "hasPinKey": false,
      "indexPath": "/unchained/mainnet/",
      "isApi": false,
      "isArchive": true,
      "isTesting": false,
      "isTracing": true,
      "rootConfig": "/root/.local/share/trueblocks/",
      "rpcProvider": "http://erigon:8547",
      "trueblocksVersion": "GHC-TrueBlocks//3.1.2-release"
    }
  ]
}

And here's some ripe-vs-unripe tests:

a900bdce7c5a:/# chifra export 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542491
blockNumber     transactionIndex        timestamp       date    from    to      value   gasPrice        gasUsed gasCost hash    isError encoding        type    ether
20542491        13      1723826975      2024-08-16 16:49:35 UTC 0x042a77a27d7677465d689fcf3161f685bd363f50      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       5589343442      255248  1426668734883616        0x48e3d94ff93e54b2e5a39d4e9fb72f65a29ad116464091e9d20ae22f99eb147e      false   0x3593564c      0x2     0
20542491        63      1723826975      2024-08-16 16:49:35 UTC 0xd51983c46e4c993dfe7063ded181da03d8cd2de8      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       4127695726      219151  904588646048626 0x58fb989e7b95914eb93c6af42b9ceef14d192127dea0dab7c0ea96eb4e89c670      false   0x3593564c      0x2     0
a900bdce7c5a:/# chifra export --unripe 0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852 --first_block 20542491
blockNumber     transactionIndex        timestamp       date    from    to      value   gasPrice        gasUsed gasCost hash    isError encoding        type    ether
20542491        13      1723826975      2024-08-16 16:49:35 UTC 0x042a77a27d7677465d689fcf3161f685bd363f50      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       5589343442      255248  1426668734883616        0x48e3d94ff93e54b2e5a39d4e9fb72f65a29ad116464091e9d20ae22f99eb147e      false   0x3593564c      0x2     0
20542491        63      1723826975      2024-08-16 16:49:35 UTC 0xd51983c46e4c993dfe7063ded181da03d8cd2de8      0x3fc91a3afd70395cd496c647d5a6cc9d4b2b7fad      0       4127695726      219151  904588646048626 0x58fb989e7b95914eb93c6af42b9ceef14d192127dea0dab7c0ea96eb4e89c670      false   0x3593564c      0x2     0

I reckon that from this benchmark, tests can be ran with any address, as long as the block count is closer to the chain head, they'll always produce these results ( on my end at least ).

Btw, if i call chifra blocks latest, i get the latest block and get the correct output, which leads me to believe that the index is actually being built, but some of the commands aren't using the data properly.

I've checked out the repo, i'm going to start going through the code to see if i'll manage to find an issue -- i'm a noob with Go though, it might take some time for me to get deep enough --.

tjayrush commented 2 months ago

This is just for my notes...

I looked at chifra transactions --cache 20567924.\* --uniq | grep 0x08 and found two appearances.

0x08f104c70126354b3755777d72436e983ba46439  20567924    51      input
0x08f104c70126354b3755777d72436e983ba46439  20567924    68      log_0_topic_2

I then ran chifra list 0x08f104c70126354b3755777d72436e983ba46439 which return zero results. This made sense since my index was not updated, so I started a watch on the address:

watch -n .2 "chifra list 0x08f104c70126354b3755777d72436e983ba46439 --unripe"

which should have show records as soon as the scraper got to block 20567924. I then started my scraper. I was about 3000 blocks behind. And it caught up, but the watch did not start showing the transactions until the block of interest was consolidated into a chunk.

It should have shown up as soon as the scraper got to block 20567924 but it took about 90 blocks later until the consolidation.

This is incorrect behaviour.

This is VERY hard to test.

tjayrush commented 2 months ago

I've found an issue -- i reckon it's most definitely an issue, not misusage -- regarding querying staged and unripe blocks. It seems to me that the chifra isn't reading either of the two when querying.

I think this may be correct...I'm seeing the same thing...

tjayrush commented 2 months ago

it's only concerning newer data on the blockchain, so for actually testing it out properly, i reckon we'll have to test it out in similar times.

It's very time dependant, so it's super hard to test...

walrusmustgo commented 2 months ago

Yeah,it's quite related to time. One of the ways i've found easiest to test is by looking into some deploys channels on telegram, and using newer addresses. That way, you always have a fresh supply of newely deployed tokens, you just need to check etherscan to be sure there's some transactions on the contract, for more data. :)

Imma shamelessly put the one i'm working on lol: https://t.me/Snipereum_ETH

tjayrush commented 2 months ago

Now,this contract has activity on it, the scraper is running properly, the staging file is growing in size, and when i try grep 0x1a7cAC64314FB35e69a1045014EEa422D902fB17 in the staging folder,

This is an excellent test. If this address is not in the files in the folders, then it will not be found. So that makes sense.

walrusmustgo commented 2 months ago

Now,this contract has activity on it, the scraper is running properly, the staging file is growing in size, and when i try grep 0x1a7cAC64314FB35e69a1045014EEa422D902fB17 in the staging folder,

This is an excellent test. If this address is not in the files in the folders, then it will not be found. So that makes sense.

Yeah,but from the looks of everything, there should be something related to the address, if we're to trust etherscan. Also, i've grep the entire file ,there was a bunch of 0xfffffff.... addresses in there, don't know if that should be the case? Unfortuantely, i've exited the shell without copying the staged file to showcase, but from my memory, it was literally ~50% of 0xfffff addresses.

tjayrush commented 2 months ago

trueblocks 3.1.2

correct

docker version

Okay, although I prefer the core directly running. The docker is okay, but may be under tested.

erigon 3.0 as a node.

Very interesting. I heard of some issue on Erigon 3.0 and logs. I have no idea if this is important, but it may be.

tjayrush commented 2 months ago

I've checked out the repo, i'm going to start going through the code to see if i'll manage to find an issue -- i'm a noob with Go though, it might take some time for me to get deep enough --.

Please, please, please let me help you here. If you have any questions about anything, ask in the discord.

tjayrush commented 2 months ago

there was a bunch of 0xfffffff.... addresses in there, don't know if that should be the case?

We do something we call being purposefully sloppy. We err on the side of being overly inclusive instead of missing addresses. It's explained in excruciating detail in the "Building the Index and Bloom Filters" section of this white paper: https://trueblocks.io/papers/2023/specification-for-the-unchained-index-v2.0.0-release.pdf

tjayrush commented 2 months ago

literally ~50% of 0xfffff addresses.

Probably not that many...You can do

chifra chunks addresses 20566100 | sort -r -n -k4,4 | head
chifra chunks addresses 20566100 | sort -r -n -k4,4 | grep 0xfffff | head

to see some of them in that block range.

walrusmustgo commented 2 months ago

erigon 3.0 as a node.

Very interesting. I heard of some issue on Erigon 3.0 and logs. I have no idea if this is important, but it may be.

Yeah,there's a bug with alpha1 version ( which i'm currently using ), the method handler tends to crash on getLogs, however!, that's not the case with TB, i've been checking the erigon logs, and it runs just fine, no errors pop out. For reference, i've tested out the graph node out of curiosity, and it can't use the alpha1 version, the getLogs error basically halts it immediately ,but somehow, TB works just fine :D

I've checked out the repo, i'm going to start going through the code to see if i'll manage to find an issue -- i'm a noob with Go though, it might take some time for me to get deep enough --.

Please, please, please let me help you here. If you have any questions about anything, ask in the discord.

Will do, i've been poking around the past few days, we can work on this for sure. I'll most probably return to TB from tomorrow ( had some unavoidable hurdles at work i've had to take care of ASAP -- which's done now ), and i'll most definitely ping you for help!

there was a bunch of 0xfffffff.... addresses in there, don't know if that should be the case?

We do something we call being purposefully sloppy. We err on the side of being overly inclusive instead of missing addresses. It's explained in excruciating detail in the "Building the Index and Bloom Filters" section of this white paper: https://trueblocks.io/papers/2023/specification-for-the-unchained-index-v2.0.0-release.pdf

Ahhhhh, yes, i remember now, i've read the whitepaper, and i was actually intrigued with this approach. :D Forgot about it, my bad. in that case, it makes sense :) However, the missed address still doesn't...

literally ~50% of 0xfffff addresses.

Probably not that many...You can do

chifra chunks addresses 20566100 | sort -r -n -k4,4 | head
chifra chunks addresses 20566100 | sort -r -n -k4,4 | grep 0xfffff | head

to see some of them in that block range.

Yeah yeah, could be less, i was eyeballing it. There was a lot, but there were a lot of "normal" addresses as well, i've probably just went overboard :D

tjayrush commented 2 months ago

How do you find these addresses that a being newly created? The trouble with testing is that either an address shows up rarely (in which case you have to wait for a new tx) or it shows up in every block (in which case TrueBlocks will choke).

Are you watching Uniswap or something? Or watching for contract deployments?

tjayrush commented 2 months ago

Cheers. Out for the evening, now.

walrusmustgo commented 2 months ago

Yeah,it's quite related to time. One of the ways i've found easiest to test is by looking into some deploys channels on telegram, and using newer addresses. That way, you always have a fresh supply of newely deployed tokens, you just need to check etherscan to be sure there's some transactions on the contract, for more data. :)

Imma shamelessly put the one i'm working on lol: https://t.me/Snipereum_ETH

I'm looking into deployments. Now, the deployments are a bit tricky, because you need to go into each one of them,and make sure there's more transactions than just contract creation ( i reckon it's much easier to find the issue with more activity ).

One more channel is this : https://t.me/SnipereumSniperScout

This one actually tracks approvals on new tokens, so basically everything newer in that channel will do, but it's possible that the token was released much earlier than the actual approvals started, so it's a trade-off.

Imo ,probably better to just track the deploys channel, it's much easier to be sure when the contract was deployed that way.

Cheers. Out for the evening, now.

Cheerio! ^^ we'll keep in touch! :)

walrusmustgo commented 2 months ago

I reckon that from this benchmark, tests can be ran with any address, as long as the block count is closer to the chain head, they'll always produce these results ( on my end at least ).

BTW, i think this's also a good approach for testing ( second message i've sent to this issue ). I've took a really popular address -- WETH/USDT Pair address in this case -- ,and tried chifra export --unripe on it, but with --first_block close to the chain head, and got basically the same behaviour as with new tokens.

tjayrush commented 2 months ago

Yeah. I was thinking of this. Sounds good.

Did some further testing. I'm on a Mac. I opened a separate finder window on each of the three folders (./unripe, ./ripe, and ./staging).

When the scraper is way behind the chain tip, there is never any items in the unripe folder. That makes perfect sense. While the scraper is "completing a pass" (that is, it's scanning blocks), the staging file stays frozen and the ./ripe folder is being filled up with individual files, one for each block. This also makes perfect sense. One the "pass" is completed, all the ./ripe folder are grouped into the stage. This also makes sense. When there's enough records on the stage (2,000,000) then a chunk is created. This also makes sense.

This is how it works when it's not at the chain tip. I'll report next what I see at the chain tip once it catches up.

tjayrush commented 2 months ago

Here's a video of what happens at the front of the chain:

https://ipfs.unchainedindex.io/ipfs/QmeHajNek31sfjueb8sjRgCpv5RctKgc2tKdXCVV7VPDVb

This appears to be working exactly as I would expect, so there must be something wrong somewhere else. The files are there. They are where I would expect them to be. I do notice that they periodically disappear and re-appear which may cause some problem, but this is no different than it's worked for six years, so I'm not going to call that a problem yet.

The problem must be somewhere in the querying code. More later...

walrusmustgo commented 2 months ago

I've got exactly the same behaviour as you, the folders are being populated in exactly the same way. I've managed to grep some unripe blocks as well, and everything seems in order...

Yeah,i've noticed that the files are consantly being deleted/added, i even found it in code while i was looking through, but from what i can tell, everything is setup nicely, so that shouldn't cause any race conditions -- however, it's not impossible, if querying has changed recently, perhaps there's a race condition, and querying code can't find the files when it needs them? --.

My bet is that it's the querying code too, since everything before the query itself seems to behave just as it should. Can you point me in the right direction, where's the querying code? I reckon it's on a per-command basis?

tjayrush commented 2 months ago

On this line ( https://github.com/TrueBlocks/trueblocks-core/blob/develop/src/apps/chifra/internal/export/output.go#L58) chifra export scans the unchained index...

with this code: https://github.com/TrueBlocks/trueblocks-core/blob/develop/src/apps/chifra/pkg/monitor/monitor_freshen.go#L84

It's a bit complicated (performance/concurrency), but this might be a good place to start.

On Wed, Aug 21, 2024 at 7:39 AM walrusmustgo @.***> wrote:

I've got exactly the same behaviour as you, the folders are being populated in exactly the same way. I've managed to grep some unripe blocks as well, and everything seems in order...

Yeah,i've noticed that the files are consantly being deleted/added, i even found it in code while i was looking through, but from what i can tell, everything is setup nicely, so that shouldn't cause any race conditions -- however, it's not impossible, if querying has changed recently, perhaps there's a race condition, and querying code can't find the files when it needs them? --.

My bet is that it's the querying code too, since everything before the query itself seems to behave just as it should. Can you point me in the right direction, where's the querying code? I reckon it's on a per-command basis?

— Reply to this email directly, view it on GitHub https://github.com/TrueBlocks/trueblocks-core/issues/3855#issuecomment-2301840771, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJKXPVAUKAXDACG5PLTSZLZSR35PAVCNFSM6AAAAABMQNHVGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBRHA2DANZXGE . You are receiving this because you commented.Message ID: @.***>

-- Thomas Jay Rush http://t http://quickblocks.iorueblocks.io @.***

walrusmustgo commented 2 months ago

Ok,i've been messing around a bit with the code, but i'm yet to find something useful.. I'll be deploying a new index today/tomorrow, and start from scratch, and let you know!