Improve the way we consume collection(s) to avoid sending all metadata in all pages

DavidWells commented 8 years ago

Related to trimming down the window.collection. #712

When you add in new custom meta (like in this blog post example) all those extra fields are also added to the window.collection data. This could potentially make it HUGE

Example:


---
layout: Post
title: 'Defining Serverless and Why It Matters to Developers'
date: 2016-09-01
description: "You’ve probably heard the term _serverless._ But what does it actually mean? And more importantly, as a developer, why should you care?"
author:
  name: Serverless
  url: http://twitter.com/goServerless
  avatar: https://avatars3.githubusercontent.com/u/13742415?v=3&s=60
tags:
- serverless

---

```json
// printed in window.collection, notice the custom meta values
{
    "layout": "Post",
    "comments": true,
    "title": "Defining Serverless and Why It Matters to Developers",
    "date": "2016-09-01T00:00:00.000Z",
    "description": "You’ve probably heard the term _serverless._ But what does it actually mean? And more importantly, as a developer, why should you care?",
    "author": {
        "name": "Serverless",
        "url": "http://twitter.com/goServerless",
        "avatar": "https://avatars3.githubusercontent.com/u/13742415?v=3&s=60"
    },
    "tags": ["serverless"],
    "__filename": "blog/defining-serverless-and-why-it-matters-to-developers.md",
    "__url": "/blog/defining-serverless-and-why-it-matters-to-developers/",
    "__resourceUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html",
    "__dataUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html.bf1d5e0db467ef721b8508992749379b.json"
}

All the data also exists in the index.html.bf1d5e0db467ef721b8508992749379b.json file.

So the question is, would it be possible/worth it to remove the additional custom fields from outputting into window.collection and just have the .json files for the individual pages handle that additional data?

thangngoc89 commented 8 years ago

would it be possible/worth it to remove the additional custom fields from outputting into window.collection

It depends on use cases. If user doesn't require any of these field in their codebase for generating a list of posts, it's possible to remove them. See collections API https://phenomic.io/docs/usage/collections/

Does it worth it? I don't know. In the huge site of yours, if you remove all of these custom fields how much bytes do you gain ?

MoOx commented 8 years ago

I am currently discussing with @bloodyowl to improve Phenomic collection API in order to stop sending almost all data into all pages. The idea is mainly to only put in the pages the data used (minimal json) and create json for each possibles pages.

For now I don't plan to add a custom way to retrieve only some fields, because if we choose the solution above, it won't be a big deal (tell me if I am wrong) to get all fields for, let's say, the 10 pages you are listing.

Again I repeat the idea we are working on: only put in the html (& json files for client nav) the data requested by a page. So no more extra unused json.

To achieve that, in order to keep an API simple (not by adding a graphql server - please @bloodyowl and others, take a look to this & tell me what you think, I personally thing it's a bit crazy to go this path, but I may be wrong) the idea is to provides HoC.

Here is some pseudo code we have in mind (API subject to changes):

class YouPageThatListContent extends ...

export default Phenomic.createContainer(
  YouPageThatListContent,
  (store) => ({
     pages: store.get("pages", { sortBy: "date", order: "DESC", limit: 5})
   })
)

// alternative
export default Phenomic.createContainer(
  YouPageThatListContent,
  (state) => ({
     pages: Phenomic.queryCollection(state.pages, { sortBy: "date", order: "DESC").slice(0, 5)
   })
)

Now you are going to ask: what about pagination? Imo this should be in core, not in a plugin...

Ok then here is an idea:

<Route
  path="/tag/:tag(/:page??)" component={ YouPageComponent }
  collection="posts" pageSize={20} sortBy="date" order="DESC"
  filter={ (item, routeParams) => item.tags.indexOf(routeParams.tag) > 0) }
/>

// ...

// YouPageComponent
export default Phenomic.createContainer(
  YouPageThatListContent,
  (state, page) => ({
     taggedPosts: pages.items,
     // you can send any kind of data, injected as props into YouPageThatListContent
     totalPages: pages.numberOfItems,

     // while pagination is allowed for one resource at a time, you might get other data as well
     authors: Phenomic.queryCollection(state.authors, { sortBy: "commits", order: "DESC").slice(0, 5)
    someRandomPosts: Phenomic.queryCollection(state.posts).randomMethodToImplement(5)
   })
)

Not that the code above assume that we will introduce a new way to register collectionS (yes multiple collections, instead of having to filter via layout or something else)

By doing that, we will be able to statically retrieve collections fraction and only inject that in the html (& as well create json fragments, for client navigation).

Any thoughts on this approach?

thangngoc89 commented 8 years ago

Totally agree with this approach. Filter with js runtime can be slow especially if you have a lot of pages.

bloodyowl commented 8 years ago

Added a few changes to my proposal:

Content definition

import Phenomic from "phenomic"

// I think that we should accept:
// type Data = { [key: string]: any | Promise<any> }
// Data | Promise<Data>
module.exports = {
  // See: https://gist.github.com/bloodyowl/27e159aa9e02c5ac40fd6ff5c2bb93e8
  posts: Phenomic.createCollection(
    requireAll(require.context("markdown!./posts", true, /\.md/)),
    { indexes: ["id", "url"] } // will create JS Maps to improve query time on build & dev server
  ),
  authors: requireAll(require.context("json!./authors", true, /\.json/)),
  // accept promises
  someExternalData: require("isomorphic-fetch")(someURL),
}

Consuming the data

Indexed queries

import React from "react"
import Phenomic from "phenomic"

const PostRoute = (props) => (
  <div>
    <h1>{props.post.title}</h1>
    <p>{props.post.content}</p>
  </div>
)

export default Phenomic.createContainer(PostRoute, {
  queries: (state, params) => ({
    // O(1) if indexed, O(N) otherwise 
    post: state.posts.getBy("id", params.id),
  })
})

"Special" queries

import React from "react"
import Phenomic from "phenomic"

const HomepageRoute = (props) => (
  <div>
    <ul>
      {props.posts.map((post) =>
        <li>{post.title}</li>
      )}
    </ul>
    <ul>
      {props.authors.forEach((author) =>
        <li>{author.username}</li>
      )}
    </ul>
  </div>
)

export default Phenomic.createContainer(HomepageRoute, {
  queries: (state, params) => ({
    posts: state.posts.queryCollection({ sortBy: "date", order: "DESC" }).slice(0, 5),
    authors: state.authors.slice(0, 5),
  })
})

Pagination

import React from "react"
import { Router, Route } from "react-router"

import PostRoute from "./PostRoute"
import HomepageRoute from "./HomepageRoute"

export default (
  <Router>
    <Route path="/" component={HomepageRoute} />
    <Route path="/post/:id" component={PostRoute} collection="posts"/>
    <Route path="/posts/page/:page" component={PostList} collection="posts" pageSize={20} />
  </Router>
)

This configuration leaves us enough information to just generate Math.ceil(collection.length / props.pageSize) pages at build time.

Phonemic.createContainer(Component, {
  queries: (state, routeParams, page) => ({
    hasNextPage: page.hasNextPage,
    posts: page.items,
  }),
})

Configuration

You basically provide your content and an instance of ReactRouter, we do the rest.

module.exports = {
  content: require("./content"),
  router: require("./web/routes/Router"),
}

Build configuration

I think that in order to prevent colliding stuff and forcing us to provide stuff for all configurations, the webpack.config.js should remain in user-land.

If you don't use the development server, webpack shouldn't even be mandatory (e.g. you get the data from an external API).

var webpack = require("webpack")
var path = require("path")

module.exports = {
  // maybe autofill entry & output, not quite sure about this yet 
  entry: {
    bundle: "phenomic/lib/entry",
  },
  output: {
    path: path.join(__dirname, "./.phenomic"),
    filename: "[name].js",
  },
  module: {
    loaders: [
      {
        test: /\.js$/,
        ignore: /node_modules/,
        loader: "babel",
        query: {
          presets: ["es2015", "react"],
        },
      },
    ],
  },
  plugins: [
    new webpack.DefinePlugin({
      "process.env": {
        NODE_ENV: JSON.stringify(process.env.NODE_ENV),
      },
    })
  ],
}

thangngoc89 commented 8 years ago

What can I do to help with this? Do you have any POC?

bloodyowl commented 8 years ago

I'm working on a POC, just need to add the PhenomicCollection and the pagination mechanism and I'll share it, that should be a good starting point 😊

thangngoc89 commented 8 years ago

Nice !

DavidWells commented 8 years ago

I wanted to follow up on this thread.

What do you guys think about normalizing the collection by perhaps URL?

This way we can have a constant lookup time

{
    "url/xyz/lolz": {
        "__dataUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html.bf1d5e0db467ef721b8508992749379b.json"
    },
    "url/two": {
        "__dataUrl": "/blog/defining-serverless-and-why-it-matters-to-developers/index.html.bf1d5e0db467ef721b8508992749379b.json"
    }
}

You could still map over the data with Object.keys if you want =)

Maybe even going a step further with https://github.com/paularmstrong/normalizr

I can probably just do this in user land but I wanted to float the idea around here as well.

bloodyowl commented 8 years ago

With the idea we have, there's not really a need for this. But if you want to create a "static API" from your contents, that should be totally possible in user-space 😃

thangngoc89 commented 8 years ago

@bloodyowl any progress on this?

MoOx commented 8 years ago

I am thinking about the fact that currently we use the entire collection to know if a click must be done using browser push + preventDefault :/

DavidWells commented 8 years ago

@MoOx I was thinking about this too.

Perhaps it could be solved with a 'smarter' link component.

Where each link component gets additional data attributes added to it on build. Then the link listener wouldn't need to check the collection it could just use the inline data-attr on the a tag.

example:

<!-- on click use router to go to /url/xyz -->
<a href='/url/xyz' data-phenomic-path='/url/xyz' data-phenomic-data='/path/index.html.bf1d5e0db467ef721b8508992749379b.json'>Local link</a>

This might even let us remove the need for the entire collection to be placed on the window?

lapidus commented 8 years ago

Newcomer to Phenomic ... Apologies for interjecting but I hope this can help others too.

Should one expect major changes to how collections are handled within the next month?
If not, it would be super helpful with another practical example in the docs on how to use the current context-based system. For example, what are the steps required to list a bunch of 'recipes' on a recipes page ... :)

MoOx commented 8 years ago

@lapidus you can expect a major change in the coming weeks! ;)

bloodyowl commented 7 years ago

Going to be fixed with #925

DavidWells commented 7 years ago

@bloodyowl awesome!

How is it being approached? I didn't see it mentioned in https://github.com/MoOx/phenomic/issues/925

bloodyowl commented 7 years ago

in 1.0.0, parsers output partial (to be used in lists) & data (used when fetching the item itself) + the window.collection disappears completely

DavidWells commented 7 years ago

@bloodyowl Cool.

A couple questions:

is all the site data still output into the DOM? Or referenced via script tag
are .json files still nested in their respective /url/path/blah/index.json folders or centralized in a single location? ref

My main concerns are making phenomic a viable option for larger website implementations =). There are certain things in the current setup that make it not an option for sites with 1000+ pages.

MoOx commented 7 years ago

No, HTML will only contains relevant data for its own page
That's not really a problem. For not we are creating a sort of "static" api, so files are not just hashes, but I guess we could improve that in the future.

The goal of 1.0 is really to make something scalable by default. We are having the same concerns as you do :)

@bloodyowl tell me if it's incorrect

bloodyowl commented 7 years ago

yeah, basically the JSON files are put in dist/phenomic and organised with the same shape they have in the dev server API

MoOx / phenomic