Site V2: Gatsby, MDX, and a new way to make content

MaximumCrash commented 4 years ago

CC @cattsmall @annarankin @edibletoaster

Hello friends, this PR is meant to serve as a form of documentation on my execution for the site's new architecture.

Preface

This PR fixes Issues: #122, #129, #128, #130, #159, #180, and #190

@cattsmall made me aware of #122 via Twitter and I took some time to audit the site for where speed issues may be creeping in. My initial conclusions were addressed by Catt (and in #126). However, there was still a growing issue. The more people that are added to the site the slower loading it's data would become. This is a good problem to have, but I believed we could take it to the next level. I tend to try to tackle 2 problems with one solution, so I had a few architectual pillars:

Make it easier for folks to add themselves. (No more merge conflicts, everyone gets their own file, writing in Markdown is the same as writing a github issue or PR. If they can write a comment here, they can create their own file with ease.)
Make the data format scalable and move away from one giant json. (MDX is great for organizing our data, but also means that if we wanted folks to have their own pages in the future, that's totally possible without having to change a person's entry)
Leverage a static site generator that is built with content delivery in mind to speed up the vistor experience.
Improve the usability of the site by introducing improvements to search.

The Stack

Gatsby.js as a React framework. (Solves Pillar 1 and 3) MDXjs as a Markdown rendering engine that allows us to interweave react components. (Solves Pillar 1, 2 and 3) Lunrjs as a static-site search indexer (Solves Pillar 4) Framer Motion for the sauce and feel of the site.

How does the directory works?

There is now a folder in the project called directory. This folder contains individual entries in the form of MDX files.

Previously there were 2 files. companies.json and people.json. I decided to instead have a 1 file per entry rule. Where each file is treated as 1 entry, but they can be filtered/designated by extending the frontmatter yaml of an MDX file. For example isCompany is my basic approach to this, but if the types of entries need to grow and shrink it's entirely possible to do so by following this pattern.

The reason for this to solve merge conflicts when using one large object to add/edit/remove people from the directory. It also makes it easier for people to find themselves to edit their own data. They can essentially go into the directory folder, find their name, and edit the file here on github (since the file is just Markdown!). The benefits to this approach should make it easier for non-technical folks to add themselves.

NOTE: I've also added new .md issue templates for folks to have an easy way to add themselves. They just have to fill out everything between the fragments and you should be able to copy and paste the data into a directory file (or just create a new file here on github.

How do we get our old data in the MDX files?

I very specifically wrote a transformation script that leverages json2md to transform all the objects and their keys from the companies/people jsons into individual MDX files. You can run it by executing yarn transform. This script is meant to only be used once! It will generate mdx files with a _v1 at the end so you know it's a file with data from the previous version of the site.

NOTE: The transformer does it's best to get everyone's information in order. For folks who used default values in certain places we skip over that bit of data. You can see how the data gets transformed in the mdxConverter.js file.

Another Note: Previously images were being served via url. This is still possible with the new system, but for the sake of consistency (and lighthouse scores) I added in the functionality for the transformer to download entry images from the urls they provided. In the case a url leads to a 404 their image will be skippe and not included in the static/directory_images folder.

How do filters get populated?

Previously a lot of this heavy lifting was done in a script on client load. This was one of the issues to scalability and why the site started lagging behind (#122). This time around filters are automatically generated based on data that exists in tags we want to filter. We leverage the different react component fragments (like Games, Location, and Skills) to act as anchors for when we read the raw file to pull out the specifically typed tags. This makes it easier for folks to write in whatever skills they want without being locked into the previous art, game design, ect. To allieviate duplicates I have a few algorithms that strip down the text to camelCase and checks that no duplicates share the same key.

ie. game design, Game Design, game DESIGN, and GAME DESIGN, and gameDesign all share the same camelcase key gameDesign

However gamedesign, gam3d3sign, and any other off varations will be treated as an individual filter. You can find my algorithms in the utils.js folder. You can also find my execution of conglomerating filter data in SiteContext.js lines 45 -> line 81.

NOTE: The filterFragments array MUST match the react component of the same name in the shorotcodes.js folder. Or pulling the data won't happen for the fragments defined.

Another note: The filterFragment method is designed to treat indivdual new line tags as individual filters. It won't solve for #107 where we just want to see if an entry has games. But you can add in that kind of granularity to the SiteContext or the index.js page where the data is properly filtered.

How does search work?

We leverage Lunr.js and build our search index by leveraging Gatsby's built in Graphql Sift querying. If you observe the gatsby-node.js file you can find where we're fetching the data and building the lunrIndex for use on the front end. I refferenced this article for executing it this way.

NOTE: While the gatsby-plugin-lunr exists. It doesn't give us frictionless flexibility we need for fuzzy search and improved tokenization so we can improve the search experience. It also includes support for index localization which is out of scope for this project's needs.

In the project itself you'll find a module called search. Inside of the SearchInput.js file is where lunr is being leveraged to run our search. I've documented in the code on exactly what each line is doing, and because Lunr's documentation kind of sucks, I recommend referencing these for api usage:

https://lunrjs.com/docs/lunr.html
https://lunrjs.com/docs/lunr.Query.html <- The more important one

How did you speed up the site?

All this magic, is more so to provide a launchpad future development and lower the bar for adding/editing/deleting data entries. However, none of this actually speeds up the site. Instead I made some design modifications to improve the site experience (search, scroll to top, ect.), but the specific fix was adding in opt-in pagination.

Loading 200+ black game dev entries to the dom at once is expensive for initial load time. It also means that if there's more people than there are companies, the companies may never be seen. I attempted to introduce infinite scroll, but react-window for virtualization is a larger headache to include than I would have liked. The blocker was that any entry can have any amount of nodes, meaning we wouldn't have a defined grid layout, but something more like masonry. For the sake of keeping the site's design and experience consistent, I believed that having opt-in pagination would work for the case where someone wants to keep scrolling. By introducing search we lower the friction for user's utilizing the site as a tool to find potential hiring candidates while also proving the fact that there are a lot of wonderfully talented and hardworking black game developers around the world.

Besides me wanting to "push it to the Maximum" site speed is increased directly because instead of initial load pushing 200+ dom elements, we only push 32. The user then opts into a smidge (milliseconds) of loading the next couple of dom elements by choice.

What's next?

It is my hope that this new site architecture opens up some new paths to what this site can offer. I think we should also be able to move the filter functionality into the gatsby-node script for each entry, that way filters are built even more so during build time.

Skies the limit. Let's push it to the Maximum 😎 👉 👉

With love, Réjon (@Maximum_Crash)

cattsmall commented 4 years ago

@MaximumCrash Thank you so much for this! @zelgadis' comments are great—I just checked the preview site and it's looking a lot better. I feel good about this!

MaximumCrash commented 4 years ago

@cattsmall @annarankin @edibletoaster I've updated the initial comment for the PR with my documentation of the site. I hope it meets your expectations. I've also updated this PR with the latest people/companies data, but I imagine that with the PRs that exist for adding folks you may be continuing to update the old json.

With that in mind you can update the people.json and companies.json in the DEPRECATED directory in this project once it's merged in somewhere. I wasn't sure which branch to point this at, but I'll leave that up to your discretion.

cattsmall commented 4 years ago

Thanks again for all your hard work @MaximumCrash! Since this can't be rebased due to conflicts, I am going to work with @annarankin to figure out how to get in all these changes without imploding anything.

annarankin commented 4 years ago

Hi @MaximumCrash! @cattsmall and I took a spin through the DEPRECATED/people.json file and added in the folks from he PRs referenced above - we re-ran the transform script and saw a bunch of new faces in the results, but noticed that some files weren't showing up in the search (ex: Lual Mayen, Junub Games). I think this is perhaps because we ran into broken image links for them - any thoughts on how we could fix up their entries so they get pulled in?

Thanks again for your excellent work! 💯

cattsmall commented 4 years ago

YOLOOOOO

MaximumCrash commented 4 years ago

@annarankin Hmm, so my hope is that the transformer should still be creating their entries even if their images 404. However there is the case where the directory was already populated and the transformer was run which lead to an overlap or duplicate files. Easy fix would be to delete all the files in the directory folder and then re-run the transformer.

If folks aren't being included in the search index, an edge case may have been missed in the design of the algorithm. I can take a look.

@cattsmall Just checked out the site, looks great!

annarankin commented 4 years ago

Sweeet - also after looking at the live site and not seeing the issue, I think the problem was between my chair and keyboard 🤦

QuantumBox / blackgamedevs