GraphQL Repository API - Githubissues

trieloff commented 5 years ago

Problem

With the existing GitHub API (+ raw) we have an easy, RESTful way of accessing repository contents, which is largely sufficient for delivery in Helix. When it comes to content management (think AEM site admin), we need something more advanced to support queries like this:

get all pages that have Helix in the title
get all pages that have been authored by @davidnuescheler
get all images in all pages that point to the DAM

This requires not just a way to express structured queries, but also a way to inspect the contents of a markdown "file" and treat it as a helix "document", i.e. expose sections, metadata, embeds, links, external object, etc.

When it comes to the query language, it seems like GraphQL is the way to go (and something we should explore). If we use GraphQL, we need a way to run a GraphQL gateway as a serverless application.

Note: GitHub does have a GraphQL API that allows us to run queries like "get all Markdown files authored by @davidnuescheler in the branch 2020-launch". It does not allow us to look into the structure of the Markdown files.

Approach 1

Until recently, the architecture I'd have used to approach this would look something like this:

install a web hook into the content repository that calls a function
which parses the markdown and dumps the parsed structure in some schemaless database like CosmosDB or Firebase
run a GraphQL gateway as a serverless function close to the data

The two problems with this approach:

step (3) is very much "???" because we don't have a ready-to-use service
we end up creating a mirror of every git repository we have access to
we need to install the webhook, so we cannot access arbitrary repositories

Approach 2

I've learned about AWS AppSync a while ago, but dismissed it because it seemed to be very mobile-focused, but after a few conversations with people that have been using it, I think it can be a viable alternative. AWS AppSync is a GraphQL gateway as a service (with a full serverless pricing model, so no paying for idle) and full extensibility (you can define custom resolvers as AWS Lambda functions).

As a result, the final architecture would look something like this:

the user (author) sends a GraphQL query as a POST request to AWS AppSync
AppSync has one custom resolver that proxies the GitHub GraphQL API, allowing queries against the repository structure
AppSync has a second custom resolver that fetches the Markdown file from raw, parses it and returns the key metadata structure (this could use a variation of the JSON pipeline)
AppSync creates a query plan and executes the query against the GitHub GraphQL API (retrieve all Markdown files) and then against the custom resolver (spin up one lambda function for each Markdown file found, parse it, etc) and then performs the join

Key aspects/challenges of this approach:

works with any repository (we don't need the bot)
does not mirror anything (we don't need a database, we don't pay for storage for inactive websites)
runs only when queries are run (and is free otherwise)
needs good caching (we don't want to re-fetch and re-parse the same version of the same file over and over again)
potentially runs with massive parallelism (tries to fetch and parse every markdown file in the repository)
potentially slower on initial queries (cold start for resolver lambda functions)

I suggest we try both approaches and compare which one works best for our use cases.

davidnuescheler commented 5 years ago

thanks for the write-up @trieloff

as for your Approach 1, Bullet point 3, i think there are a range of serverless implementations that could be used, here is an example for firebase: https://codeburst.io/graphql-on-cloud-functions-for-firebase-153fe7b02ea5

constanceyu commented 5 years ago

thanks for writing up the document! i will take a look once i am relieved from finals :> @trieloff @davidnuescheler

davidnuescheler commented 5 years ago

Also we could look into https://www.apollographql.com/

trieloff commented 5 years ago

Apollo is great, but you have to run it yourself, so we might have to find a way to build an Apollo-to-OpenWhisk adapter.

constanceyu commented 5 years ago

i was trying to set up a simple graphQL (apollo) + firebase service and it seems manageable with hardcoded data into the firebase. here is the github repo https://github.com/craeyu/firebase-apollo-demo and i will send the credentials to my firebase on slack with you two. (put it inside functions folder)

now i am thinking about looking into git hooks and bots for parsing briefly - feel free to let me know your concerns and suggestions. thanks you!

kptdobe commented 5 years ago

In the research phase, can we also take at https://developer.github.com/v4/ (GitHub GraphQL API) ... ? or maybe someone already knows the limitations...

trieloff commented 5 years ago

As far as I understand, is it possible to stack GraphQL APIs, so that we can have the Helix GraphQL API that calls the GitHub GraphQL API for anything file, branch, commit, or user-related and uses the Firebase backend for anything that looks inside the files.

The same would be possible with AppSync, we'd just have to write resolver functions that call the GitHub GraphQL API.

koraa commented 5 years ago

I am trying to understand a bit more about how graphql works, and I am afraid I am having a bit of a hard time finding graphql examples of more complex queries. Could someone provide examples for this?

E.g. what would the following query look like:

Generate a list of all markdown documents that where authored by "boris@example.com" OR whose title matches /^[Hh]ackaton:.*Brussels/?
Same query as above, but only those markdown documents that where edited in march 2019 (might serve as an example of a join with a Commits table?`

constanceyu commented 5 years ago

hi @koraa! in that case, you would need to write specific query and resolvers for them. here are some queries examples that are more complicated: https://graphql.org/learn/queries/ for example, if i only want the title of the markdowns that were authored by "boris@example.com" can be

{
  markdowns(author: "boris@example.com") {
    title
  }
}

and similarly if i want the same query items but also the markdowns' content i can do

{
  markdowns(author: "boris@example.com") {
    title
    content
  }
}

and for regex support we can leverage gatsby, here are some examples of gatsby + graphQL https://www.gatsbyjs.org/docs/graphql-reference/ for the second point, we would need to figure out the schema for our markdowns. what fields should one markdown contain, do we want to support timestamps for the last modifications, etc. once we have that field in db and schema we can run queries on it.

right now we do not have clear opinion on how the schema is going to look like yet - so depends on the need we might need to tweak things as we go. :)

koraa commented 5 years ago

Hiho @craeyu :)

Thanks for your explanation!

So if I understand you correctly, we would have to build custom functions for each query we run? So GraphQL is more like a very nice & fancy way of describing rest apis, not a replacement for SQL or something like the mongodb query language?

I was hoping there would be some generic/built in support for query operators (like in SQl, or the $regex, $gt, $lt, ... operators); this would give us the advantage of being able to support a lot of complex queries out of the box including those our customers come up with on their own. In the classical graphql development model we would have to manually write an endpoint/graphql function for each type of query we would want to support. In the code of the endpoint we would manually write our DB query in SQL (or whatever DB we would be using)? Did I get that right?

Maybe we could write a generic GraphQL –> DB query language translation layer with the full feature set. Is graphql powerful enough to build a complex querying grammar with it? I haven't found support for json or key/value mappings, so the only thing I can think of is using plain strings:

{
  markdowns(author: "{\"$regex\": \"^boris\"}") {
    title
    content
  }
}

Although in order to support more complex queries like "or" we would probably take the entire query as a string? (I've saved myself the trouble of quoting properly this time).

{
  markdowns(query: '{$or: [{author: {$regex: "^boris"}}, {modified_at: {$gt: "1 Jan 2019 0:00 CEST}]') {
    title
    content
  }
}

So this would basically end up being a "mongodb query lang over GraphQL" protocol. We would create a couple of graphql functions (find, findOne, ...) for each collection in our db; the functions would just convert the payload to bson and pass it along to mongodb. Of course we would miss a lot of functionality very important fo good performance (creating indexes, cursors for iterating over long query results, batch operations to name just a few), we would introduce some extra latency & complexity and we would have to think about how our middleware would handel authentication.

I've found something like this for postgres+graphql (https://github.com/graphile/postgraphile), but that one seems to implement quite a small subset of possible SQL queries…

@craeyu @trieloff What do you think about graphql as a db query lang? Is there a better way to do it :)

constanceyu commented 5 years ago

newest update - running graphQL with azure and postgres db. for simplicity reasons right now i am using an auto incrementing integer for primary key id but if we decided to use SQL instead of NoSQL database, i think we should use a more meaningful key. @stefan-guggisberg suggested we can use the url for md. i plan to look into cosmos db today since it is NoSQL.

constanceyu commented 5 years ago

elloo! @koraa yup REST API and GraphQL serve the same purpose, just GraphQL has some small changes that potentially make developer's experience better in terms of building and consuming APIs. here are some of the differences and similarities: https://blog.apollographql.com/graphql-vs-rest-5d425123e34b

if we were to manually set up everything from scratch then at the end of the day we would need to write to and read from db with queries. but we can leverage existing and well documented and supported gql libraries such as apollo and relay to avoid doing so ourselves.

we can build the regex with gatsby-node.js and then feed the whole regex to a simple gql query like this:

query MarkdownByAuthor($authorRegex: String!) {
  allMarkdowns(filter: {frontmatter: {author: {regex: $authorRegex}}}) {
    edges {
      node {
        frontmatter {
          title
          author
        }
      }
    }
  }
}

for the or query you are talking about, i am thinking we can just build a Query object that resolves on the results we get from both input variables.

gql still has good pagination support if performance is a concern: https://graphql.org/learn/pagination/

trieloff commented 5 years ago

@koraa GraphQL is a ok query language, but that's not the point. It is popular and well-known and seems to be a good architectural fit [to be validated] for expressing selective reads over disparate data sets.

@craeyu this is looking good. A few suggestions:

[x] don't worry too much about the database, we can always switch from Postgres to CosmosDB later, if we want to
[x] don't worry too much about the schema either, we first want to validate the general approach, can fine-tune later
[x] that being said, markdowns sounds wrong to me, I'd prefer documents. Never heard someone use "Markdown" in plural

There are two things I'd like to see in this branch of the exploration before we look at the second approach (AppSync + Parsing on demand)

[x] federation of queries to the GitHub GraphQL API: "get all markdown files in the branch preview that have been last modified by user emmasmith, then get the markdown document corresponding to the file and check if the title has more than three words"
[x] population of the database through to Helix Bot

tripodsan commented 5 years ago

[ ] population of the database through to Helix Bot

@craeyu I spent some time exploring the bots, let me know if you need some pointers. To start with, I guess it's simpler for you to create your own github app, so it's easier to play around with. later we can integrate your code into to the helix-bot.

bdelacretaz commented 5 years ago

@craeyu I have another use case for GraphQL queries, that might be worth considering in your explorations: tag-based navigation of the content.

Taking https://sling.apache.org/tags/community.html as an example (because I built it, not because it's pretty ;-), the list (or cloud) of all tags found on the website is included in navigation pages, and each content page displays its tags as links that point to such navigation pages.

The tags are defined as Markdown front matter.

I suppose the navigation pages could be generated based on GraphQL queries. One potential problem is that with Helix Strains you can aggregate multiple sources of content in a single website, we'd need to take that into account.

https://github.com/adobe/helix-example-advanced/issues/5 has more details, if you think such a use case is useful for your project and demos.

constanceyu commented 5 years ago

i cooked up a simple github graphQL query that demonstrates some features github gql has but it doesn't have the level of granularity we desire yet (whether that is possible is still under investigation):

query getCommitFilesAndContent($owner: String!, $repoName: String!, $commitHash: String!, $targetBranch: String!, $targetFile: String!){
  repository(owner: $owner, name: $repoName) {
    # look up a specific commit based on the commit hash
    commit: object(expression: $commitHash) {
      ... on Commit {
        oid
        author {
          date
          email
          name
        }
      }
    }
    # look up all the files in a directory
    files: object(expression: $targetBranch) {
      ... on Tree {
        entries {
          name
        }
      }
    }
    # plain GQL does not support string and variable concatenation and interpolation
    content: object(expression: $targetFile) {
      ... on Blob {
        text
      }
    }
  }
}

together with these variables:

{
  "owner": "craeyu",
  "repoName": "craeyu.github.io",
  "commitHash": "58050cb996440859dfe9950a8a8f381124331745",
  "targetBranch": "constance/githubgql:",
  "targetFile": "constance/githubgql:index.htm"
}

returns the following result:

{
  "data": {
    "repository": {
      "commit": {
        "oid": "58050cb996440859dfe9950a8a8f381124331745",
        "author": {
          "date": "2019-07-10T17:14:42.000-07:00",
          "email": "faithwyu@gmail.com",
          "name": "constance yu"
        }
      },
      "files": {
        "entries": [
          {
            "name": "css"
          },
          {
            "name": "favicon.ico"
          },
          {
            "name": "fonts"
          },
          {
            "name": "images"
          },
          {
            "name": "index.htm"
          },
          {
            "name": "js"
          }
          ....
        ]
      },
      "content": {
        "text": "<!DOCTYPE html>\n<html lang=\"en\"><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge\">\n<title>Constance Yu</title>\n<link rel=\"stylesheet\" href=\"./css/magnific-popup.css\">\n<link rel=\"stylesheet\" href=\"./css/font-awesome.min.css\">\n\n<link href=\"./css/css\" rel=\"stylesheet\" type=\"text/css\">\n\n<!-- main css -->\n<link rel=\"stylesheet\" href=\"./css/style.css\">\n\n<!-- favicon -->\n<link rel=\"icon\" type=\"image/png\" href=\"./favicon.ico\">\n\n</head>
......
</html>"
      }
    }
  }
}

here is another query that queries all the branches i was playing with (saw ferrum haha @koraa)

a few things to notice:

the expression variable uses rev-parse to parse it so it supports patterns and range (https://git-scm.com/docs/git-rev-parse)
github graphQL itself does not support string and variable concatenation and interpolation directly, might want to consider Gatsby if needed
has a rate limit (https://developer.github.com/v4/guides/resource-limitations/)

constanceyu commented 5 years ago

after debriefing with @davidnuescheler, for now we are trying to use the azure + postgreSQL model to perform a simple task of scanning a github repository to extract all its markdown files, and parse it to get its title (currently defined as # TITLE), and populate the postgreSQL database. for now it is a simple node.js application that is named helix-scanner.

originally we were also interested in querying the files by last modification time, but the immediate issue is that git does not keep track of file modification date directly, only commit timestamps. (extra readings for those who are interested)

if you are interested in testing it locally you can ask me for my tokens and passwords please dont make it public yet since i started using environment variable halfway and people can see the sensitive info from commit, thx!

after that we think we are going to look into the idea @bdelacretaz mentioned, sounds really interesting!

constanceyu commented 5 years ago

newest progress update: the nodejs app can take in --owner repo/owner/name --repo repo/name --path path/name to get the titles of any repo's markdown files in the specified path. it does not recursively parses through the directory trees for now.

stefan-guggisberg commented 5 years ago

Great progress! A question though: The url column stores the url to the document I assume. Why did you store the GitHub API url rather than the raw protocol url?

constanceyu commented 5 years ago

hello @stefan-guggisberg, it is because when you want the content of github files and call the corresponding API with the file sha, the returned response only contains the api url rather than the html url. but i think it might be possible to use the html url, would need some tweaking works

stefan-guggisberg commented 5 years ago

@craeyu The GitHub API is rate limited, i.e. it allows 5000 authenticated requests / 60 unauthenticated requests per client per hour. I guess that sooner or later you'll run into those limits.

The GitHub raw protocol OTOH is not rate limited. It's not officially documented but widely used. The url's follow a simple structure. When you have the owner, repo name, path and reference (branch or tag name) you can easily build the url. See https://github.com/adobe/git-server#1-git-raw-protocol for examples.

constanceyu commented 5 years ago

@stefan-guggisberg ah i see - that makes things much easier. thank you! and yes already ran into the rate limit issue even as an authenticated user 😥

tripodsan commented 5 years ago

so I think the problem is to find a stable url for the file, the contents api returns:

$ curl -s https://api.github.com/repos/tripodsan/hlxtest/contents/index.md
{
  "name": "index.md",
  "path": "index.md",
  "sha": "e8ccf96a926b0e59e21626ccb8563c231e1f1e97",
  "size": 1276,
  "url": "https://api.github.com/repos/tripodsan/hlxtest/contents/index.md?ref=master",
  "html_url": "https://github.com/tripodsan/hlxtest/blob/master/index.md",
  "git_url": "https://api.github.com/repos/tripodsan/hlxtest/git/blobs/e8ccf96a926b0e59e21626ccb8563c231e1f1e97",
  "download_url": "https://raw.githubusercontent.com/tripodsan/hlxtest/master/index.md",
  "type": "file",
  "content": "...",
  "encoding": "base64",
  "_links": {
    "self": "https://api.github.com/repos/tripodsan/hlxtest/contents/index.md?ref=master",
    "git": "https://api.github.com/repos/tripodsan/hlxtest/git/blobs/e8ccf96a926b0e59e21626ccb8563c231e1f1e97",
    "html": "https://github.com/tripodsan/hlxtest/blob/master/index.md"
  }
}

so the blob-id is the SHA of the actual blob. there is IMO no way to retrieve the blob via the SHA from the raw api.

but you could use the SHA of the commit of the file. by using the git-resolve-ref action or similar.

eg:

$ curl "https://adobeioruntime.net/api/v1/web/helix/helix-services/resolve-git-ref@v1?owner=tripodsan&repo=hlxtest&ref=master"
{
  "fqRef": "refs/heads/master",
  "sha": "0eec03399af02e2418316114dfc35710c463e102"
}
$ curl -s https://raw.githubusercontent.com/tripodsan/hlxtest/0eec03399af02e2418316114dfc35710c463e102/index.md
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one or more
  ~ contributor license agreements.  See the NOTICE file distributed with
  ~ this work for additional information regarding copyright ownership.
...

it would even be helpful to store the commit SHAs of the ref when the files were indexed.

davidnuescheler commented 5 years ago

i think that stability or other transaction-like behavior is neither desired nor required.

in my mind the index/table is an incomplete, best effort representation of a subset of the content in md.

it is acceptable that the index is wildly out of sync with the content (imagine a change in the index config which requires a reindex of a large repo) and i can't come up with any usecase that would use anything but the HEAD of a branch for a query.

tripodsan commented 5 years ago

@craeyu I spoke with @davidnuescheler today, and we had an idea on how to handle the index definitions.

we could use the pipeline to render the table entries for each index. for example, having a script like: json_idx.pre.js would kick in, when you request a resource like /readme.idx.json. since it runs the normal pipeline, you'd get the dom representation of the markdown for free. the response would be something like:

{
  "url": "https://raw.githubusercontent.com/tripodsan/hlxtest/master/index.md",
  "tables": [{
    "name": "basic",
    "entries": {
      "title": "Hello Project",
      "description": "This is my hello projects.",
    } 
  }, {
   "name": "authors",
   "entries": {
      "title": "Hello Project",
      "authors": ["toby", "david"],
    }
  }]
};

so you could use the response to update your database.

constanceyu commented 5 years ago

@tripodsan @davidnuescheler ah makes sense. good idea!

davidnuescheler commented 5 years ago

i actually think that the url is probably not really as it would be the base path (between the host to the first . ) of the request url. for more complex pipelines i think the connection to the original .md is probably less relevant...

so a request to:

https://helix-home-adobe.project-helix.page/hackathons/5-bsl.idx.json

would produce...

{
  "tables": [{
    "name": "basic",
    "entries": {
      "title": "Helix Hackathon Part V",
      "description": "May 20-24 2019 - Basel, Switzerland"
    } 
  }, {
   "name": "authors",
   "entries": {
      "title": "Helix Hackathon Part V",
      "authors": "@rofe @trieloff @stefan-guggisberg @koraa @davidnuescheler @anfibiacreativa @meryllblanchet @royfielding @tripodsan @simonwex @lkrapf @kgera @dominique-pfister @bdelacretaz @asanso @kamendola @acollign @kptdobe"
    }
  }]
};

and would in turn create tables/rows that look like this

basic

path	title	description
/hackathons/5-bsl	Helix Hackathon Part V	May 20-24 2019 - Basel, Switzerland

authors

path	title	authors
/hackathons/5-bsl	Helix Hackathon Part V	@rofe @trieloff @stefan-guggisberg @koraa @davidnuescheler @anfibiacreativa @meryllblanchet @royfielding @tripodsan @simonwex @lkrapf @kgera @dominique-pfister @bdelacretaz @asanso @kamendola @acollign @kptdobe

constanceyu commented 5 years ago

@davidnuescheler cool. yes if that is the case, i agree. we can use path as primary key. currently just finished making the request recursive and moved the app into local forked helix-home doing some adjustments right now

when using the app with the root as path

tripodsan commented 5 years ago

i actually think that the url is probably not really as it would be the base path (between the host to the first . ) of the request url.

ok, but having the path present in the JSON is IMO a good idea anyways. so you don't need to guess the id from the request, and you don't need to remember it, in case you transfer the response to another action.

also, the location of the original document is IMO important as well. in case, the backend wants to do some further processing.

if you look at the response from the github API, they include a bunch of back-references to the document itself and further relations. I think this is a good practice.

tripodsan commented 5 years ago

moved the app into local forked helix-home doing some adjustments right now

@davidnuescheler should we create a dedicated repo for this?

trieloff commented 5 years ago

With:

a selector like idx.json
a GitHub GraphQL API that allows us to get files by modification date
a good cache
a custom resolver that turns the GitHub API URL into a URL containing the .idx.json ending and the SHA (to keep it cachable)

we might be able to circumvent the database entirely, which can be useful in the case of schema updates (just add a field to idx.json and you are done, no re-indexing needed)

davidnuescheler commented 5 years ago

ok, but having the path present in the JSON is IMO a good idea anyways. so you don't need to guess the id from the request, and you don't need to remember it, in case you transfer the response to another action.

ok agreed.

also, the location of the original document is IMO important as well. in case, the backend wants to do some further processing.

i think that's semantically wrong, because we are not indexing the .md anymore but really the resulting DOM or HTML output, which may at that stage contain a lot of information from other sources.

@davidnuescheler should we create a dedicated repo for this?

at some point yes, but at this point helix-home just serves as a sample content repo, and the easiest way to add the .idx stuff is by putting this into the src to make development really easy...

constanceyu commented 5 years ago

latest update - able to issue request using helix-pipeline-like idx_html.pre.js to process markdowns recursively through the github repo. so far all these have been integrated with the node js app.

next steps are to make the db queries smarter (e.g. when the table given this name does not exist yet, create the table then populate data) and add capabilities to handle more complicated table schema entries. excited to make idx_html.pre.js more powerful! :>

constanceyu commented 5 years ago

update:

dynamic tables

scanner app can create new tables and adjust columns dynamically depending on the json input received. when a new table name shows up, a bare bone default table will be created. it would always have path as its primary key and then dynamically add columns as they new entries come

question: if client uses queryselectorAll to select multiple elements from the document, how should we represent the result in postgresQL database? use an Array for multivalued field type? when to determine what is appropriate for an array type? e.g. 1 title per file seems like an convention

PostgresQL JSON type

stumbled upon postgresql json (credit: @davidnuescheler) and JSONB type. it is pretty interesting. PostgresQL supports them and native query supports operators for JSON objects. ->> returns object field in JSON so that we can continue chaining them... the thing that is slightly inconvenient is the GraphQL layer since once we commit to JSON object it is a blob and we can no longer use the clean gql query to get subfields cleanly like this.

{
  documents {
    entries {
      title
      images {
        title
      }
    }
  }
}

i will look into the PostgresQL <=> GraphQL resolvers more to see if there is a workaround. for now we can either query all content and later on parse it ourselves or only select one field we are interested in by specifying path and changing it with . .

selecting all content

querying specific field

description field in `idx_html.pre.js`

description field is now working! it uses the selection criteria .title .header p. interesting that the header gave me back the hamburger 🍔

constanceyu commented 5 years ago

creating view for querying subfields

one workaround of querying JSON objects through subfields is to create a new VIEW that has the subfields you desire. for example:

CREATE VIEW title_listing AS
SELECT
  path, 
  entries->>'title' AS title
FROM
  simple
;

then you can write GraphQL query to query subfield title now:

use variable to specify search

another new feature: using keywords where and _contains can aid user to create queries that are tailored to get all documents containing desired information. e.g. all documents with the title Getting Started with Helix Screen Shot 2019-07-29 at 11 10 23 AM

constanceyu commented 5 years ago

marching towards botland! 🤖 (with officer @tripodsan sensei)

created a standalone bot called helix-scanner-bot that listens to git push events and consults helix rendered DOM for metadata. currently working on my personal repo craeyu/helix-home and next step is db integration. still blocked on azure provisioning so no access to database - not entirely sure if the db connection will work with the bot yet Screen Shot 2019-08-05 at 5 10 37 PM Screen Shot 2019-08-05 at 5 08 48 PM

constanceyu commented 5 years ago

update:

database issue has been solved (yay! 🎉) and confirmed that bot can use credentials to access it.

status:

currently translating node.js app language into bot language. finished the functionalities of parsing and storing data in JSONB type field with path as primary key.

next steps:

tomorrow will have a PR up for adobe/helix-home that contains my idx_html.pre.js that has the extractions of content. i will also refactor the code into async/await style and add a feature along the way supporting storing data for new content type img.

constanceyu commented 5 years ago

new JSON pipeline

after discussion with @davidnuescheler @trieloff @tripodsan , we created a new pipeline named idx_json that returns an json object instead of stringifies the DOM data and add to context.content.json.

bot features transportation

currently the two data storage features have been fully transported to helix-bot on my branch constance/scanner-bot and the file is called scan.js (one creates just one simple json column and the other one generates multiple columns based on the result received from pipeline). the next item in action is to test this bot with adobe/helix-home repo and refactor it completely using await/async instead of callbacks

constanceyu commented 5 years ago

bot living in `adobe i/o runtime` and working with `adobe/helix-home`

functionality wise it is fulfilled! still going through construction of async/await (goodbye "call back hell", it was fun having you faithfully serving me... 😿)

sequencing diagram (08/16/19)

here is a sequencing diagram of my project so far (thanks for the help @tripodsan) Automated Markdown Data Parser, Storage, and GQL query

rofe commented 4 years ago

What's the status here? Moving out of "in progress" column for now.

trieloff commented 4 years ago

I think we've found a viable alternative approach and can close this.

adobe / helix-home

GraphQL Repository API #24

Problem

Approach 1

Approach 2

basic

authors

dynamic tables

PostgresQL JSON type

description field in `idx_html.pre.js`

creating view for querying subfields

use variable to specify search

marching towards botland! 🤖 (with officer @tripodsan sensei)

update:

status:

next steps:

new JSON pipeline

bot features transportation

bot living in `adobe i/o runtime` and working with `adobe/helix-home`

sequencing diagram (08/16/19)

adobe / helix-home

GraphQL Repository API #24

Problem

Approach 1

Approach 2

basic

authors

dynamic tables

PostgresQL JSON type

description field in idx_html.pre.js

creating view for querying subfields

use variable to specify search

marching towards botland! 🤖 (with officer @tripodsan sensei)

update:

status:

next steps:

new JSON pipeline

bot features transportation

bot living in adobe i/o runtime and working with adobe/helix-home

sequencing diagram (08/16/19)

description field in `idx_html.pre.js`

bot living in `adobe i/o runtime` and working with `adobe/helix-home`