[X] I checked the documentation and found no answer to my problem.
[X] I checked the existing issues to make sure my problem has not already been reported.
[X] I have read the code of conduct before creating this issue.
Expected Behavior
CI should be more deterministic than they currently are. More often than not, I find myself having to re-run tests, sometimes more than once.
Current Behavior
Cumulative E2E test behavior
| Deterministic | Chaotic |
| <----------------------------------|------------*---------------------> |
| | ^ |
getting uncomfortably far
from "Determinstic" and
trending in the wrong direction
We've observed flickering in the end-to-end suite which covers this repo, as is evident by CI re-run going back some time. I've classified these issues given recent examples but this classification may not be exhaustive:
Schema related errors
These have been around the longest as far as I recall. The specific entity and relation vary from run to run. Our working hypothesis has been that it is caused by the graphql-engine service coming up before the subquery-node has finished setting up the DB schema. On boot, graphql-engine will scan the DB schema and use that to determine how to define it's GQL resolvers, which if done before all tables have finished generating, would likely result in this kind of error. It's clearly a failure of integration at runtime as the schema and resulting generated code is all present and correct. This hypothesis is corroborated by the fact that when this phenomenon is experienced locally, restarting the graphql-engine consistently fixes the issue (whereas restarting the docker composition does not necessarily).
======================================================================
ERROR: test_blocks_query (entities.test_authz_exec.TestAuthzExec)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_authz_exec.py", line 179, in test_blocks_query
result = self.gql_client.execute(query)
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 388, in execute
data = loop.run_until_complete(
File "/opt/hostedtoolcache/Python/3.9.[16](https://github.com/fetchai/ledger-subquery/actions/runs/3749061679/jobs/6367112249#step:13:17)/x64/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 285, in execute_async
return await session.execute(
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 1220, in execute
result = await self._execute(
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 1111, in _execute
self.client.validate(document)
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 149, in validate
raise validation_errors[0]
graphql.error.graphql_error.GraphQLError: Cannot query field 'subMessages' on type 'AuthzExec'. Did you mean 'message' or 'messageId'?
GraphQL request:12:25
11 | grantee
12 | subMessages {
| ^
13 | nodes {
======================================================================
ERROR: test_primitives_query (entities.test_native_primitives.TestNativePrimitives) [primitive id by regex]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_native_primitives.py", line 338, in test_primitives_query
result = self.gql_client.execute(query)
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 388, in execute
data = loop.run_until_complete(
File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 285, in execute_async
return await session.execute(
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 12[20](https://github.com/fetchai/ledger-subquery/actions/runs/3749061679/jobs/6367112249#step:13:21), in execute
result = await self._execute(
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 1111, in _execute
self.client.validate(document)
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/gql/client.py", line 149, in validate
raise validation_errors[0]
graphql.error.graphql_error.GraphQLError: Cannot query field 'attributes' on type 'Event'. Did you mean 'eventAttributes'?
GraphQL request:12:29
11 | }
12 | attributes {
| ^
13 | nodes {
----------------------------------------------------------------------
Ran 44 tests in 339.[23](https://github.com/fetchai/ledger-subquery/actions/runs/3749061679/jobs/6367112249#step:13:24)0s
FAILED (errors=2)
Contract upload (and/or download) related errors
This one is pretty new as far as I can recall but is effecting old tests. I suspect it may have been introduced in or around #187. Perhaps, specifically, commit 6ad556bdfa0b3bdfca5145ec54e0b625d08cdba4.
======================================================================
ERROR: setUpClass (entities.test_legacy_bridge_swap.TestContractSwap)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_legacy_bridge_swap.py", line 25, in setUpClass
cls._contract._store()
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/helpers/contracts.py", line 161, in _store
return self.store(self.admin, self.gas_limit)
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/cosmpy/aerial/contract/__init__.py", line 169, in store
submitted_tx = prepare_and_broadcast_basic_transaction(
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/cosmpy/aerial/tx_helpers.py", line 178, in wait_to_complete
self._response.ensure_successful()
File "/home/runner/.local/share/virtualenvs/ledger-subquery-lKQxoaNQ/lib/python3.9/site-packages/cosmpy/aerial/tx_helpers.py", line [96](https://github.com/fetchai/ledger-subquery/actions/runs/3758060514/jobs/6385954111#step:15:97), in ensure_successful
raise BroadcastError(self.hash, self.raw_log)
cosmpy.aerial.exceptions.BroadcastError: failed to execute message; message index: 0: Null/Nil argument: wasm: create wasm contract failed
----------------------------------------------------------------------
Ran 36 tests in 211.813s
FAILED (errors=5)
Order related errors
I recall these first appearing shortly after the order by features where already merged. As far as I can recall, this one has only ever looked exactly like this, no variation as the other classes (i.e. always '9' not less than or equal to '10' ... on the Cw20BalanceChange entity test). Annecdotally (see: #198, specifically b357a9), deleting the respective actions cache may have an effect in this case.
======================================================================
FAIL: test_retrieve_balance_change (entities.test_CW20_balance_change.TestCw20BalanceChange) [order by block height ascending]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_CW20_balance_change.py", line 246, in test_retrieve_balance_change
orderAssert(
AssertionError: '10' not greater than or equal to '9' : OrderAssertError: order of objects is incorrect
======================================================================
FAIL: test_retrieve_balance_change (entities.test_CW[20](https://github.com/fetchai/ledger-subquery/actions/runs/3758060514/jobs/6387093094#step:15:21)_balance_change.TestCw20BalanceChange) [order by block height descending]
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_CW20_balance_change.py", line 246, in test_retrieve_balance_change
orderAssert(
AssertionError: '9' not less than or equal to '10' : OrderAssertError: order of objects is incorrect
----------------------------------------------------------------------
Ran 48 tests in 337.486s
FAILED (failures=2)
Empty table errors
This is another older one.
======================================================================
FAIL: test_claim_rewards (entities.test_delegation_reward_claim.TestDelegation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_delegation_reward_claim.py", line 47, in test_claim_rewards
self.assertIsNotNone(
AssertionError: unexpectedly None :
DBError: table is empty - maybe indexer did not find an entry?
======================================================================
FAIL: test_contract_swap (entities.test_legacy_bridge_swap.TestContractSwap)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_legacy_bridge_swap.py", line 43, in test_contract_swap
self.assertIsNotNone(
AssertionError: unexpectedly None :
DBError: table is empty - maybe indexer did not find an entry?
======================================================================
FAIL: test_native_transfer (entities.test_native_transfer.TestNativeTransfer)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_native_transfer.py", line 39, in test_native_transfer
self.assertIsNotNone(
AssertionError: unexpectedly None :
DBError: table is empty - maybe indexer did not find an entry?
----------------------------------------------------------------------
Ran 42 tests in [19](https://github.com/fetchai/ledger-subquery/actions/runs/3584098635/jobs/6030334598#step:14:20)0.070s
FAILED (failures=3)
Authz
I discovered this one when looking through old CI logs, it seems to have presented (multiple times) during work on the Authz executions handler (#147). I also spotted it in #173.
======================================================================
FAIL: test_exec_delegate (entities.test_authz_exec.TestAuthzExec)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/ledger-subquery/ledger-subquery/tests/e2e/entities/test_authz_exec.py", line 112, in test_exec_delegate
self.assertEqual(len(messages), self.expected_msgs_len)
AssertionError: 7 != 5
----------------------------------------------------------------------
Ran 35 tests in 178.[9](https://github.com/fetchai/ledger-subquery/actions/runs/3280709924/jobs/5401829244#step:11:10)53s
FAILED (failures=1)
Steps To Reproduce
Open a PR and wait for CI to fail. :sweat_smile:
If CI passes on any given run you can get it it to again by pushing an empty commit:
git commit --allow-empty -m 'poke CI'
git push
Context
This seems pretty consistent in CI in my experience but can also happen when running everything locally as well. I can't recall definitively if we've observed this happening in production (i.e. kubernetes as opposed to docker compose) but it is something we've been checking for in a sanity check which goes in between sync completion and switching the routing for the public DNS at the end of our deployment operation pipeline. In practice this entails running a GQL query which requests the total count of each entity. This at least ensures that the tables themselves were all present when the graphql-engine (labeled "api" in helm/kubernetes files) started. Because graphql type/entity fields map to DB table columns and column and table creation happen are in the same atomic operation, we don't have to check that all fields are included in the resulting graphql schema. However, this would be fairly trivial via the protocol introspection integration included in the playground UI.
Prerequisites
Expected Behavior
CI should be more deterministic than they currently are. More often than not, I find myself having to re-run tests, sometimes more than once.
Current Behavior
Cumulative E2E test behavior
We've observed flickering in the end-to-end suite which covers this repo, as is evident by CI re-run going back some time. I've classified these issues given recent examples but this classification may not be exhaustive:
Schema related errors
These have been around the longest as far as I recall. The specific entity and relation vary from run to run. Our working hypothesis has been that it is caused by the graphql-engine service coming up before the subquery-node has finished setting up the DB schema. On boot, graphql-engine will scan the DB schema and use that to determine how to define it's GQL resolvers, which if done before all tables have finished generating, would likely result in this kind of error. It's clearly a failure of integration at runtime as the schema and resulting generated code is all present and correct. This hypothesis is corroborated by the fact that when this phenomenon is experienced locally, restarting the graphql-engine consistently fixes the issue (whereas restarting the docker composition does not necessarily).
Contract upload (and/or download) related errors
This one is pretty new as far as I can recall but is effecting old tests. I suspect it may have been introduced in or around #187. Perhaps, specifically, commit 6ad556bdfa0b3bdfca5145ec54e0b625d08cdba4.
Order related errors
I recall these first appearing shortly after the order by features where already merged. As far as I can recall, this one has only ever looked exactly like this, no variation as the other classes (i.e. always
'9' not less than or equal to '10' ...
on theCw20BalanceChange
entity test). Annecdotally (see: #198, specificallyb357a9
), deleting the respective actions cache may have an effect in this case.Empty table errors
This is another older one.
Authz
I discovered this one when looking through old CI logs, it seems to have presented (multiple times) during work on the Authz executions handler (#147). I also spotted it in #173.
Steps To Reproduce
Open a PR and wait for CI to fail. :sweat_smile:
If CI passes on any given run you can get it it to again by pushing an empty commit:
Context
This seems pretty consistent in CI in my experience but can also happen when running everything locally as well. I can't recall definitively if we've observed this happening in production (i.e. kubernetes as opposed to docker compose) but it is something we've been checking for in a sanity check which goes in between sync completion and switching the routing for the public DNS at the end of our deployment operation pipeline. In practice this entails running a GQL query which requests the total count of each entity. This at least ensures that the tables themselves were all present when the graphql-engine (labeled "api" in helm/kubernetes files) started. Because graphql type/entity fields map to DB table columns and column and table creation happen are in the same atomic operation, we don't have to check that all fields are included in the resulting graphql schema. However, this would be fairly trivial via the protocol introspection integration included in the playground UI.
Failure Logs
Inlined with "Current Behavior"