bbc / simorgh

The BBC's Open Source Web Application. Contributions welcome! Used on some of our biggest websites, e.g.
https://www.bbc.com/pidgin
Other
1.42k stars 225 forks source link
article bbc express news nextjs react reactjs typescript

Simorgh

Test Coverage Known Vulnerabilities Maintainability Storybook

BBC World Service News websites are rendered using Simorgh, a ReactJS based application. Simorgh also renders AMP news article pages for World Service, Public Service News and BBC Sport.

Simorgh provides a fast and accessible web experience used by millions of people around the world each month (see list of websites using Simorgh). It is regularly maintained and well documented, and we welcome open source contributors.

Simorgh is primarily maintained by the BBC News Web Engineering teams. It delivers highly trusted news to readers all over the world, currently in (41 languages). We support a wide range of devices and care deeply about scale, performance, and accessibility. We work in agile, flexible teams, and have an exciting roadmap for future development.

Documentation index

Please familiarise yourself with our:

NB there is further documentation colocated with relevant code. The above list is an index of the top-level documentation of our repo.

Simorgh Overview

A High Level User Journey

The initial page load - Server Side Render (SSR)

A request to a BBC article (https://www.bbc.co.uk/news/articles/clldg965yzjo) is passed on to the Simorgh application from a proprietary routing and caching service (called Mozart).

The request matches a route in our express server using a regex match (articleRegexPath || frontPageRegexPath). If the URL matches the pre-defined regex pattern for an article or a front page we fetch some params from the route using the getRouteProps function. This returns the service, isAmp, route and match properties. Route is a react-router route that defines a method to fetch the initial JSON used to render the page and the react container in which to render i.e. ArticleContainer, this is typically called getInitialData

Once data is returned we pull the status code and pass all of this data as props to our main document using renderDocument.

The Document passes the URL, JSON data, BBC Origin, isAmp and the service to the main App container and the result is rendered to a string using reacts own renderToString method. This string is then passed to DocumentComponent as the main app along with the assets array, style tags (the output from styled components) and any scripts/links that need to be added to the head. This is then rendered to static HTML markup using reacts own renderToStaticMarkup and sent back to the user as static HTML. Included in this response are links to our JS bundles which a users device will download to bootstrap the single page application (SPA) for subsequent journeys.

Now that the raw HTML has been downloaded, the client-side JS file kicks in and hydrates the initial response with the client side application. During this process react uses the initial JSON payload (available on the global window object SIMORGH_DATA) to hydrate the original markup returned by ReactDOMServer. React expects that the rendered content is identical between the server and the client (This is why we send the initial JSON payload with the SSR page, so the hydration phase runs with the same data that the server render used).

Rendering a Page

The JSON payload for an article consists of a number of Blocks. Each block is an object which represents an element on the page, this could be a Heading, an Image, a Paragraph etc. Each of these blocks has a block type and a block type will match up to a specific container in Simorgh e.g. blockType: image will match to the Image container.

The ArticleMain container will iterate over each JSON block, match it against its corresponding react container and pass the data via props. These containers are where the logic for rendering each block type sits. It is at this point where we use the installed frontend components from the Psammead component library. For example the Image container will import the Figure container, and Figure will import and use the psammead-image and the psammead-image-placeholder components. An image on an article will generally have a caption, so the Figure container will import the caption container which may include more frontend components from Psammead to render a caption on top of the image.

This process is repeated for each block within an article, ultimately rendering the main body of a news article using a combination of React containers for the business logic and React components for the frontend markup.

A Page Render Lifecycle

Each render is passed through a set of HOC's (Higher Order Components) to enhance the page, these HOC's are;

With a selection of page types passed through withOptimizelyProvider, that enables usage of Optimizely in the selected page types.

withVariant

The variant HOC ensures that services that have variants (e.g. simp, lat) always redirects to a url that renders the appropriate variant.

If a user navigates to a url without providing the variant, and variant is set in cookie, the cookie variant page is rendered. Otherwise, the default variant page is rendered

If a user navigates to a url with a variant, and variant is set in cookie, the cookie variant page is rendered. Otherwise, the requested variant page is rendered.

withContexts

The withContexts HOC is a wrapper that provides access to the different context providers available in the application. Any child component inside of these context providers has access to the context data via the useContexts hook.

withPageWrapper

The page wrapper HOC simply wraps the Article or FrontPage containers with a layout, at present we only have a single page layout. This layout includes the header, footer and context providers rendering the main body as a child between the header and the footer.

withError

The error HOC checks the error prop passed in, if error is set to null the Article or FrontPage container is simply returned.

If error is set to true the Error component is returned, giving the user a visual indication of the error e.g. a 500 error page.

withData

Assuming the other HOC's have returned the original Article or FrontPage container the data HOC will run some validation checks on the JSON data passed in via the data prop. If all of the checks are satisfied the ArticleContainer will be returned with a single pageData prop. This pageData props will house the JSON data to be rendered e.g. the Optimo blocks for a given article.

withHashChangeHandler

The withHashChangeHandler HOC is a wrapper applied to all pages that checks for changes to the URL hash value. Pages include accessibility controls to skip content should the user choose to do so, this utilises the URL hash to skip users to specific areas of the page. Due to the nature of the client side routing, changes to the URL results in a re-render. This causes some unsightly UI flickering for some components, specifically media and social embeds. This HOC applies checks to the URL so see if a re-render is necessary, or if not preventing a re-render using React.memo.

withOptimizelyProvider

The withOptimizelyProvider HOC returns components that have been enhanced with access to an Optimizely client, that is used to run our A/B testing. This is done to limit bundle sizes, as we seperate some of our bundles by page type, that means if we're only running A/B testing on certain page types, we can prevent polluting page type bundles with the weight of the SDK library we use for Optimizely.

withOptimizelyProvider should be added as the value of the handlerBeforeContexts object key within applyBasicPageHandlers.js, as the ckns_mvt is set within the UserContext, so the withOptimizelyProvider HOC needs to be applied in the correct order alongside the withContexts HOC. This makes the ckns_mvt available on first time visits to pass into the OptimizelyProvider, along with attributes such as service, which is used for determining when Optimizely should enable an experiment.

Example for Article page:

import withOptimizelyProvider from '#app/legacy/containers/PageHandlers/withOptimizelyProvider';
import ArticlePage from './ArticlePage';
import applyBasicPageHandlers from '../utils/applyBasicPageHandlers';

export default applyBasicPageHandlers(ArticlePage, {
  handlerBeforeContexts: withOptimizelyProvider,
});

Adding a new Page type

When adding a new page type there are several parts required.

1) Fixture data should be added to /data/{{service}}/{{pageType}}/

2) Serving the fixture data on local development

3) Create a new container for the page type

4) Add new pre-processing rules if required.

5) Add a new route to the react router config

6) Add Cypress E2E tests for the new page type

NB: With this many steps it is suggested to have multiple PRs when adding a new page type as to not have a singular huge PR. However, if Cypress tests (#6) are not added in the same PR as the page routing (#5) they should immediately follow the page routing PR, ideally these should be handled in a single PR.

Before Installation

Please read: CONTRIBUTING.md

Installation

Install Node. https://nodejs.org/en/. We use the version specified in .nvmrc and if you have a node version manager (nvm) you can run the following script to automatically change to the project supported version.

nvm use

Install Yarn

The Simorgh project uses Yarn for package management. It is recommended to install Yarn through the npm package manager, which comes bundled with Node.js when you install it on your system. To install Yarn, run this command:

npm install --global yarn

Then you can run the following commands to install Simorgh

git clone git@github.com:bbc/simorgh.git
cd simorgh
yarn install

Local Development

To run this application locally, with hot-reloading, run

yarn dev

The application will start on http://localhost:7080.

Article pages are served at routes of the format /news/articles/:id where id is the asset ID generated by the Content Management System.

FYI: Article explaining the BBC's use of ids in URL

These two News articles are available on the Test environment of our CMS, as well as locally, so are often used for testing:

We are also serving AMP HTML pages at the route /news/articles/:id.amp https://www.ampproject.org

Services with variants can't be accessed using the format above, instead the variant must be provided in the URL.

Front pages

World Service front pages are served in the format /:service where service represents a World Service site:

The World Service front pages follow the article format for AMP too, being available at /:service.amp:

Services with variants can't be accessed using the format above, instead the variant must be provided in the URL.

Topic Pages

Topic pages use internal BBC APIs that are not publicly accessible. This can cause the following warnings to appear when developing locally:

No BFF_PATH set as environment variable, you will not have access to topics

Internal developers who need to work on topic pages locally should contact the team for access.

Recommendations

Recommendations in story pages also use internal BBC data labs API's. It requires adding the key/value pair in envConfig/secret.env file for them to appear locally.

Internal developers who need to work on article pages locally should contact the team for access.

Other page types

You can find other pages types by looking through our routes and their associates regexes, but we suggest you start with the above then have a look at the core of the application to understand and find the other routes.

Storybook (UI Development Environment/Style Guide)

We use Storybook for developing components in isolation from the Simorgh Application. You can access this at https://bbc.github.io/simorgh/

To run locally yarn storybook, it will then be available at http://localhost:9001/. Introduction to and documentation for Storybook is here: https://storybook.js.org/basics/introduction/.

When viewing Video stories locally, make sure to use a BBC domain, as outlined in the changing request location section. Video will not work in the hosted version of Storybook linked above for this reason.

We also use Chromatic QA to run cross-browser testing on our stories.

Please also note that if you would like to see the components rendered with our fonts, you will need to force a repaint of the canvas. This is because our fonts all have the font-display property of optional or swap in accordance with the respective loading strategies here: https://ws-downloads.files.bbci.co.uk/fonts/index.html. The easiest way to force a repaint is just to move the divider between the preview window the and Knobs section or resize the browser window.

Configuring the application to run on a local network

If you want to host the application to be accessible through your local network, follow the instructions here.

Production build locally

To run this application locally with a production build, run: yarn build && yarn start.

We use yarn build locally which bundles the application pointing at localhost for data and static assets.

Using environment builds locally

This is mainly used for debugging latest using the TEST and LIVE environment bundles. Ensure that the bundles exist in the static asset location for the correct environment before starting to debug.

To run TEST bundles on localhost:

To run LIVE bundles on localhost:

Changing request location

Some features perform differently dependant on whether a user is located within the UK or internationally. You can explicitly request a specific version by accessing Simorgh via a specific localhost BBC domain:

If these urls do not work, you may need to add a hosts file entry (/etc/hosts or C:\Windows\System32\drivers\etc\hosts):

127.0.0.1 localhost.bbc.co.uk
127.0.0.1 localhost.bbc.com

Production build on CI

On deployment make buildCi is run in the CI environment which creates bundles for both the test and live environments. On the two environments the .env.test or .env.live files overwrite the .env file which is used to run the application with the correct bundles.

Bundle analysis reports

Every run of yarn build will update the bundle analysis files in the repo. To view a breakdown of the bundle size, open the generated html report in a browser ./reports/webpackBundleReport.html This is generated via webpack-bundle-analyzer. The data is also available as json ./reports/webpackBundleReport.json.

Tests

Linting and unit tests

We have linting with the Airbnb styleguide and we use Prettier as a code formatter. They can be run with yarn test:lint.

We have Jest unit tests that can be run with yarn test:unit.

yarn test runs both sets of these.

End-to-end tests

Main application

We use Cypress for our end-to-end tests. To run the smoke tests locally, run this single command:

yarn test:e2e

It will spin up a production server on port 7080 and run the Cypress tests against that. To run the smoke tests interactively, run:

yarn test:e2e:interactive

This loads a user interface which easily allows for individual tests to be run alongside a visual stream of the browser, as the tests run.

Environment variables

There are several environment variables you can use with our test suite, which are:

Environment variable Effect Possible values
CYPRESS_ONLY_SERVICE Restricts to running only the specified service A single service i.e. CYPRESS_ONLY_SERVICE=urdu
CYPRESS_APP_ENV Runs the tests in a specific environment test, local, live
CYPRESS_SMOKE Runs only smoke tests if true true, false
CYPRESS_UK See running e2es in the UK against Live true, false
CYPRESS_SKIP_EU See running e2es outside EU true, false

These commands can be run in combination.

Full suite of tests

The default way to run the e2e suite aka yarn test:e2e or yarn test:e2e:interactive runs a subset of our tests, otherwise know as smoke tests. To run the full suite:

CYPRESS_SMOKE=false yarn test:e2e

Limiting scope of runs

Tests can be restricted to only run for a single service by specifying it using the CYPRESS_ONLY_SERVICE environment variable. For example:

CYPRESS_ONLY_SERVICE=urdu yarn test:e2e

To run only a particular spec it is necessary to invoke Cypress directly. First ensure Simorgh is already running in another tab and then run (for example, to only run article tests):

npx cypress run --spec cypress/integration/pages/articles/index.js

Further details on using the Cypress CLI can be found at https://docs.cypress.io/guides/guides/command-line.html

Running e2e in the UK against LIVE

This affects developers based in the UK only (but may affect you if you're using a VPN routing through the UK)

Cypress .visit() function is locked to visiting a single domain per test. This becomes problematic when you launch the e2e tests from within the UK, due to redirects from .com to .co.uk. By default cypress tests will run as if they were ran outside of the uk. In order to run these tests from the UK you have to pass in the UK Cypress environment variable to the tests. This will replace the URL endings to .co.uk, which will allow you to run these tests successfully.

Here is an example command:

CYPRESS_APP_ENV=test CYPRESS_UK=true CYPRESS_SMOKE=true yarn cypress

Running e2e outside EU

This affects developers based out of the EU (but may affect you if you're using a VPN routing through a country not in the EU)

Running Cypress tests outside the EU will not show the EU consent banners on AMP, and this may cause some tests to fail. Set CYPRESS_SKIP_EU=true to prevent these tests from running when outside the EU.

An example command will be:

CYPRESS_SKIP_EU=true yarn cypress:interactive

The following command runs both simorgh and cypress:

CYPRESS_APP_ENV=local CYPRESS_UK=true CYPRESS_SMOKE=true yarn test:e2e

CYPRESS_APP_ENV can also be set equal to 'test' and 'live'. CYPRESS_SMOKE can be true or false. It is true by default and runs a specific subset of tests.

Lighthouse Best Practice tests

We use Lighthouse to test the performance of our page. However these have been moved out of Simorgh down to our own internal CD processes. This allows us to run these tests on a more accurate depiction of Simorgh. You are free to run lighthouse on your own from your Chrome browser or use the Node Lighthouse CLI.

Why is it called Simorgh?!

Named Simorgh after the Persian mythological bird. The Simorgh is the amalgam of many birds (and in some accounts other animals) into one.

Happily, a metaphor which seemed apt for offering all BBC articles in one solution is perhaps now even more appropriate as the application evolves to support more content types. It’s also a clear reference to the international nature of our teams, but also to the desire to ensure articles (and everything which has followed) works for users in all languages the BBC supports.

It is also a unique name which is practical and, more superficially, the bird is very pretty.