Joystream / hydra

A Substrate indexing framework
49 stars 45 forks source link

Hydra v2 Progress Issue #10

Closed bedeho closed 3 years ago

bedeho commented 4 years ago

Mon, Aug 24th

Agenda

I think Arsen can start as soon as he is ready, even if these are not done, but let me know if you think that would be counter-productive.



## Present

- Metin
- Dmitrii
- Bedeho

## Topics covered
- What is Dmitrii currently working on?
- What Metin is working on, and details of how to address those bugs.
- Do we really need to fix the mappings for Kusama treasury now that we have submitted, its not the highest priority?
- Should we continue to try to fix the out of memory issue now?
- Perhaps we need a better solution for handling naming conflicts, using a manifest or some other more explicit approach.

## Conclusions
- Dmitrii will focus on a+b, and mix in d for the next week or so.
- Metin will focus on writing mappings, with tests for some part of our runtime, and will try to identify bugs and rough edges of the developer workflow. Its very important here to get to a place where we find out how to give mapping author confidence that they are doing things correctly.
- We will delay and see what to do about c, hopefully we can settle next meeting.
- We will delay any work on manifest solution for now, Dmitrii will make issue.
- The out of memory bug will either implicitly get resolved by Dmitriis work, or it will pop up again in our own node, and then we will have better shot at local reproduction.
bedeho commented 4 years ago

Tue, Sep 1st

Agenda

  1. Current status for Dmitri
  2. Current status for Metin
  3. Review https://github.com/Joystream/hydra/issues/8, with special focus on typing issue which has received reply form Lezek here https://gist.github.com/bedeho/92fac100fa7f7762aba2c2423f27d58c#gistcomment-3430074
  4. Discuss how to approach calls with interested third parties that want to support Hydra.

Present

Topics covered

Conclusions

bedeho commented 4 years ago

Mon, Sep 7th

Agenda

1. Review Babylon network release plan, and resolve key questions

Present

Conclusions

  1. We will need input from Mokhtar on how to do effective integration testing of the query node. We were concerned that running it as part of the overall testing framework would be a hassle, for example in terms of running time. Having standalone integration tets just for the query node would allow us to have a prebuilt node and only run certain queries. But then it does feel like a lot to have both network and node integration tests.

  2. We decided against including any of the new advanced work, such as typed mappings or manifest files. First off, there is just the uncertainty around feasibility and timeline. Secondly, as emphasized by Dmitrii, we may need to refactor the entire Hydra framework to be more robust, with an architecture similar to Subscan/Polkascan. We will revisit this later.

  3. Metin will start working on the schemas and mappings, as Arsen has been delayed, and Dmitrii will try to work on Hydra itself.

  4. There was a polkadotjs version incompatibility issue which made it hard to smoothly populate the indexer database. The only easy way to resolve is to have access to a newer versions of Polkdotjs from what Dmitrii and Metin can gather. Dmitrii will try to experiment with a new version and synching against the new Joystream node for the imminent Alexandria release, and we will seek advice from Mokhtar as well.

  5. We identified the need to introduce a status/progress API for the integration tests, at least for the generated node, the indexer could also be useful in the future. This API should provide events+reads that expose how many blocks have been processed or fetched, respectively.

bedeho commented 4 years ago

Tue, Sep 7th

Agenda

  1. What is required to make the query node part of network testing infrastructure?

  2. How can we turn off running irrelevant scenarios when working on PRs that are focused only on query node?

  3. How can we avoid rebuilding full node & runtime when working on PRs that are focused only on query node?

  4. What is the current status of the indexer issued with Polkadotjs

Present

Conclusions

  1. @mnaamani can solve all query node related requirements so long as we have query node in docker image, which we do. Ansible can be used to run with correct prebuilt assets and limit scenarios by looking at either labels or commit message. This can all be handled by Mokhtar

  2. @dzhelezov found that with polakdotjs 1.31 he was able to run indexer against Kusama and at that quickly. He believes this now works is because of this new feature https://github.com/polkadot-js/api/pull/2535. However, we need to confirm whether this version of polkadotjs is compatible with our current Substrate version. He will ask around to figure out that compatibility. If that is compatible, we can just keep using this version for Hydra/query node exclusively, no other part of the Joystream code base needs to upgrade. If this is not compatible, then we need to consider alternatives for the indexer that are not based on polkadotjs high level APIs.

bedeho commented 4 years ago

Mon, Sep 14th

Agenda

  1. What is our status on the release so far?

  2. We have had some contradictory mental models, at least it appeared, on how exactly the architecture of the processor & indexer should be, summarized in this question issue from Bedeho: https://github.com/Joystream/hydra/issues/24#issuecomment-691445871

Present

Covered

  1. The difference in the model was that Dmitrii was foreseeing lots of distinct processors within the Joystream API, while I was thinking there was just one. Distinct processors, generating distinct APIs, which then can be stitched together at a top-level API, is sort of in GraphQL spirit, however, it wasn't clear that the underlying Joystream Runtime data model. and final AP actually could work with this sort of segmentation, and it appeared to cause many complexities. We do however need to look out for ways of making the schemas manageable to work with, even if there is one monolithic query state.

  2. The estimation went reasonably well for both Dmitrii & Metin, however, there was some significant uncertainty about some parts of the work for both.

  3. Can we run the indexers sooner rather than running everything later? There are still lurking bugs and issues showing up with the Kusama network, and we are better off trying to get a node running as soon as possible.

  4. How will Arsen (@iorveth) & Mokhtar (@mnaamani ) contribute to various Hydra & query node tasks in Babylon?

Conclusions

  1. We will only have a single monolithic processor and query stage for the query node, this impacts some of Dmitrii's tasks, which he will update.

  2. We will attempt to get an indexer running as soon as possible, and we will try to have a long-running indexer talking to a simple Babylon full node as soon as we can, just producing semi-empty blocks.

  3. Mokhtar will assist in updating Polakdotjs joystream types library as soon as possible, as Lezsek will not be back for at least a week. It's not clear how much of the query-node integration testing scenarios he will write, but he may contribute once he is done with his main task of integrating the query node in the testing infrastructure itself. To begin with, all tasks are assigned to Dmitrii.

  4. Dmitrii will add a separate task about writing integration tests just for the Hydra indexer.

  5. Arsen will work with Metin to write the query-node when he is done with a working runtime, as he is very knowledgeable about the content directory, but initially, all tasks are assigned to Metin.

bedeho commented 4 years ago

Mon, Sep 21th

Agenda

  1. Where are we, where can we be next week?
  2. Discuss: https://github.com/Joystream/joystream/issues/1409

Present

Covered

  1. Agenda
    • Arsen has gotten started and reviwed his first query node PRs.
    • Metin is close to finishing the input schema, and will share in next few days.
    • Dmitrii has been working on integrating processor and indexer
    • We all agreed with final remark on #1409
  2. How to proceed the next week.
  3. Performance issues with indexer API due to default GraphQL resolver behaviour in Warthog (which also powers this API) to not consolidate queries.
  4. Substrate builders program application timing: we discussed when to apply, and also what to do about funding further Hydra development in general
  5. Hydra future.

Conclusions

  1. Arsen and Metin will focus on writing mappings the next week, but it's unclear how much progress can be made.

  2. Dmitrii will focus on completing the integraiton of the indexer and processor, and then commence with first Hydra integration test on indexer with template chain.

  3. We will postpone a proper fix of the API issue until after Babylon is out.

  4. We should do the builders program ASAP, Dmitrii will look into what is required to apply, and the timeline and commitments required of the program. We should also apply to other funding sources, but this requires harmonizing requests guided by an underlying plan and vision for what we want to do. This planning could take a little more time. Bedeho is happy to spend time on this, but needs some calendar space to get started.

  5. This is a complex topic, we should schedule explicit time for it later. It does not appear to be a rush, as any plan depends on what we are doing anyway, which is make Hydra better for anyone to use.

dzhelezov commented 4 years ago

Quick recap of Hydra as of 29.09

TBD Core Hydra features Priority: 1

TBD Pre-integration tests (w/o mappings logic) Priority: 2

TBD Infrastructure tasks: Priority: 3

bedeho commented 4 years ago

Wed, Sep 30th

Agenda

  1. Review https://github.com/Joystream/hydra/issues/10#issuecomment-700580784
  2. Dmitrii status on: integration tests & subscriptions
  3. Metin status on: schema progress
  4. Dmitrii: re-evaluation of monitoring & cost of introducing type safe mappings
  5. Can we get transaction handlers?

Present

Covered

  1. Dmitrii explained the role of the Redis message broker, which is as a queue between the indexer node and the indexer API server. The API will provide a GraphQL subscription which alerts client when a new block has been fetched, and the client would at this point fetch the block using a separate query.

  2. There is a limitation in Warthog preventing the completion of the approach described in the prior point, but it is expected that this will be addressed by the Warthog maintainer with a small fix soon, however, if that was to fall through we can rely on the already working polling based approach in the processor.

  3. Metin is working on mappings, of which we expect 25 or so to be needed, and it takes about 2-3 hours per small group (2-3) of related events. Arsen has not started contributing on the mapper side yet.

  4. We discussed how type safe mappings would be implemented, and how to deal with the fact that the Polkadot/Joystream types are distinct from the Warthog data model types, and also how to deal with the fact that recovering extrinsic parameters when processing events can be quite hard, and its not clear how to make type safe mappings in such cases. The conclusion here was that we need transaction handlers to sidestep this entirely.

  5. We briefly discussed when a working chain for Babylon with relevant transactions.

  6. How should query deployment and hosting be handled?

Conclusions

bedeho commented 3 years ago

Mon, Oct 5th

Agenda

Present

Covered

  1. Dmitrii has sidestepped performance issue which demanded use of subscriptions to keep up with index status, hence we are no longer depending on Warthog fix in time, but we will make switch whenever fix is available.
  2. Dmitrii has made initial lower level integration tests to ensure that indexer worked, and they are passing.
  3. Dmitrii suggested we introduce deeper e2e logging infrastructure for all of our hosted infrastructure, and that this perhaps would make sense to bundle with Kubernets introduction.
  4. Metin has reworked input schemas to only have higher level types.
  5. Metin has written prehandlers for membership queries, and so far it looks clean.
  6. We disussed how to deal with ID fields in input schemes. We decided that all @entities should have an ID field with clear deterministic semantics, explained as inline comments, which mappers will enforce, and app developers can read in docs. Minor changes are required in Warthog schema generation code would be required.
  7. We were all synched on Major directions issue.

Conclusions

bedeho commented 3 years ago

Mon, Oct 12th

Agenda

This was an impromptu meeting to discuss a single urgent issue of mismatching expectations between testing and hydra/query node teams. There was no sufficient time to properly do a team call today.

Present

Covered

  1. Mokhtar explained that, he was working on refactoring integration testing code, and there was still multiple steps remaining in order to just build and run the query node as part of the CI. He was not sure whether it was still on his task list to write the actual integation tests, and he needed help from Dmitrii to get query node up and running.
  2. Dmitrii was mainly focusing on getting e2e tests with a chain to run, something which he was nearing in on.
  3. Metin & Arsen were still focusing on mappings, and in particular tackling what should be used for Ids in various entities.
  4. We also discussed how we could shift resource around to assist Mokhtar

Conclusions

bedeho commented 3 years ago

Tue, Oct 13th

Agenda

This was an impromptu meeting to discuss a single urgent issue of mismatching expectations between testing and hydra/query node teams. There was no sufficient time to properly do a team call today.

Present

Covered

  1. The nature of the prior entity ID problem:
    • Warthog had conflicting id with input schema id, so the latter had to be ignored, this has been fixed.
    • Allowing mapping author to provide id field value required hydra CLI changes, this has been done.
  2. We discussed Babylon test plan proposal: https://github.com/Joystream/joystream/issues/1526
    • We can already create classes and schemas based on JSON using tooling.
    • OK, Arsen will provide feedback, or request assistance if there is any problem.
  3. How to handle property value update events
    • Only easy approach seems like static mapping from IDs to Warthog data model fields
    • Leszek may offer extra information about how he does the opposite

Conclusions

bedeho commented 3 years ago

Tue, Oct 13th

Agenda

This was an impromptu meeting to discuss a single urgent issue of mismatching expectations between testing and hydra/query node teams. There was no sufficient time to properly do a team call today.

Present

Covered

Conclusions

bedeho commented 3 years ago

Tue, 27th Oct

Agenda

This was a weekly meeting to discuss progress on Hydra

Present

Covered

Conclusions

bedeho commented 3 years ago

Mon, 2nd Nov

Agenda

This was a weekly meeting to discuss progress on Hydra.

Present

Covered

Conclusions

bedeho commented 3 years ago

Mon, 2nd Nov

Agenda

This was a weekly meeting to discuss progress on Hydra.

Present

Covered

Conclusions

bedeho commented 3 years ago

Mon, 16th Nov

Agenda

This was a weekly meeting to discuss progress on Hydra.

Present

Covered

Conclusions

bedeho commented 3 years ago

Closing this now, as we are basically done with everyhting for Babylon and v2.