Site storage - Githubissues

bgrgicak commented 1 month ago

This task is a part of the Web app redesign project.

We need to create a way to store metadata of sites available to a user.

[ ] Site list each site in OPFS has a metadata.json file
- [ ] Site title
- [ ] Site slug
- [ ] Favicon (ideally a data url)
- [ ] Blueprint
- [ ] Date created
- [ ] Date last active (used to determine the current site)
- [ ] Logs
- [ ] Storage (browser | none | device)
[ ] Add site with default settings
[ ] Update site
[ ] Delete site
[ ] Reset site
[ ] Get site

adamziel commented 1 month ago

Let's take this opportunity to design a solid metadata format. Some questions to ponder:

How many JSON schemas will we need? There's site metadata, runtime setup options, Blueprints. Is it fine to have three files? Would it be useful to fold it into one?
Should the metadata file be exported with the site and easily reusable across different Playground apps?
Is there something we can reuse from the JSON format used by Studio?
Should it contain the PHP and WordPress version? That would duplicate what's inside the Blueprint. Or should it not contain it? But then, we'd have to update the stored copy of the Blueprint used to create a site whenever the user changes the PHP or WordPress version.

cc @brandonpayton

brandonpayton commented 1 month ago

Let's take this opportunity to design a solid metadata format. Some questions to ponder:

This sounds like a great idea. I haven't had time to look at this today but plan to begin tomorrow the morning.

brandonpayton commented 1 month ago

First task:

Establish a simple interface for listing OPFS sites for the site manager being added in #1661.

brandonpayton commented 1 month ago

I considered making a simple PR starting with @bgrgicak's OPFS-reading function from the site manager view PR. But we will outgrow that interface almost immediately, and it wouldn't be better than the local function that is there now.

I started thinking about the interface we'd actually want for loading sites from a variety of sources. Here are my thoughts.

Today, we save and load sites from:

OPFS
Local FileSystem

But we would also like to be able to load sites from other sources like:

Git repo containing multiple sites on a single branch
Git repo with a site version per branch
List of External ZIP file URLs
Third-party HTTP API

Each source would define its own Load/Save operations, and since both Load and Save operations can be slow, the interface would need to support progress updates that can be reflected to users.

For Load, each source query would yield an event stream. We could read each source's stream individually or optionally compose them into a single loading event stream. (Perhaps we could use the Streams API. I don't know enough yet to know whether it can be used to yield streams of objects, but it looks like it might.)

If using the Streams API, each load event might look something like:

interface SiteListingEvent {
    value: {
        source: sourceSlug,
        // Site schema TBD
        sites: Site[];
        // This is the projected total site count
        // and could be used as the basis for progress UI.
        // This number could change as a source is being read
        // and more sites are being discovered.
        expectedTotal: number;
    }
    done: boolean;
}

Ideally, the loading streams would be cancelable.

Before loading, Playground sources would be configured via APIs like query string, Blueprints, etc.

Some possible lenses for the site sourcing and listing:

Combine Site Sources into one group
Read multiple site sources and group by source
Single source, a source equivalent to the "seamless" UI option

These are my thoughts, and I hope to start prototyping tomorrow.

bgrgicak commented 1 month ago

@brandonpayton This sounds like a good solution to the load/save problem we discussed a few times and I like that it would enable Playground to use more sources.

I'm unsure if this project is the right time to do it as it seems unrelated and we could add it later, but prototyping it now sounds like a good next step.

How do temporary sites fit into this interface? Today we need to show the current temporary site in the sidebar. For example, if you open playground.wordpress.net it should show up as a temporary site. Longterm, we would like to allow people to save temporary sites as templates. For example, I test a plugin frequently and then have a template that spins up the setup in a temporary site. Would templates even be a good fit for the site storage?

brandonpayton commented 1 month ago

Let's take this opportunity to design a solid metadata format. Some questions to ponder:

How many JSON schemas will we need? There's site metadata, runtime setup options, Blueprints. Is it fine to have three files? Would it be useful to fold it into one?

Should it contain the PHP and WordPress version? That would duplicate what's inside the Blueprint. Or should it not contain it? But then, we'd have to update the stored copy of the Blueprint used to create a site whenever the user changes the PHP or WordPress version.

My initial thought is that site state is site state and is separate from the idea of Blueprints. Current Playground site state can be initialized with a Blueprint and subsequently changed through manual interactions with the user. At that point, we can have a site state that has diverged far from its initial state.

It might be useful metadata to remember the Blueprint and other ingredients that went into creating a site, but unless we can represent every subsequent state change in terms of a Blueprint, expressing and sharing overall site state seems more like export and site-transfer-protocol territory.

It would be great if we can explore some of that space as part of this effort, but for the first iteration of the site metadata format, let's start with something flat and simple. Site metadata could include the initial Blueprint and WP settings but do so as historical information rather than a representation of current site state and config.

Should the metadata file be exported with the site and easily reusable across different Playground apps?

This seems like a good idea. Providing Playground site information with an export opens them to unforeseen possibilities. Is there a downside?

Is there something we can reuse from the JSON format used by Studio?

For reference, here is a sample of the per-site format used by Studio app: (the adminPassword is fine to share as this is a throwaway local site)

    {
      "id": "89d73cb6-154f-41bb-9590-9a270edceedd",
      "name": "My Noble Website 2",
      "path": "/Users/brandon/Studio/my-noble-website-2",
      "adminPassword": "STclSmd5bm5zc0lIZHRqUyFuZGslRUM4",
      "port": 8882,
      "phpVersion": "8.1",
      "themeDetails": {
        "name": "Twenty Twenty-Four",
        "path": "/var/www/html/wp-content/themes/twentytwentyfour",
        "slug": "twentytwentyfour",
        "isBlockTheme": true,
        "supportsWidgets": false,
        "supportsMenus": false
      }
    }

So far, I don't see much here that seems good or necessary to include in our site metadata. Maybe:

    {
      "id": "89d73cb6-154f-41bb-9590-9a270edceedd",
      "name": "My Noble Website 2",
      "phpVersion": "8.1",
    }

For now, let's sculpt site metadata that makes sense for Playground and then consider how it might be useful outside of Playground.

brandonpayton commented 1 month ago

@brandonpayton This sounds like a good solution to the load/save problem we discussed a few times and I like that it would enable Playground to use more sources.

Glad to hear it. ☺️

I'm unsure if this project is the right time to do it as it seems unrelated and we could add it later, but prototyping it now sounds like a good next step.

You're right. I started considering an interface for retrieving a list of sites along with some ideas @adamziel had mentioned about importing sets of sites from sources like Git repos, and I lost a bit of focus on the purpose of this issue.

How do temporary sites fit into this interface? Today we need to show the current temporary site in the sidebar.

An in-memory site wouldn't be retrieved from anywhere but would be treated as part of the current site list.

Probably the UI should work with sites through the redux store. There will likely be source-specific interfaces under the covers, but we can abstract those operations with redux actions and state updates.

For example, if you open playground.wordpress.net it should show up as a temporary site. Longterm, we would like to allow people to save temporary sites as templates. For example, I test a plugin frequently and then have a template that spins up the setup in a temporary site. Would templates even be a good fit for the site storage?

A template just seems like a different category of persisted site. Is there something that makes them fundamentally different?

Perhaps templates could be edited directly or perhaps not. But at least a template could be used to create a new temporary site, and that temporary site could be modified and used to create yet another template or saved as a regular, persisted site.

brandonpayton commented 1 month ago

I'm unsure if this project is the right time to do it as it seems unrelated and we could add it later, but prototyping it now sounds like a good next step.

You're right. I started considering an interface for retrieving a list of sites along with some ideas @adamziel had mentioned about importing sets of sites from sources like Git repos, and I lost a bit of focus on the purpose of this issue.

I hope to play with the streaming idea as we go, but next, I plan to focus on providing a redux-based interaction with the site list, sculpting the site metadata format, and considering a simple interface for I/O operations that can support different site sources.

Aiming for a Draft PR for this work tomorrow.

bgrgicak commented 1 month ago

It would be great if we can explore some of that space as part of this effort, but for the first iteration of the site metadata format, let's start with something flat and simple. Site metadata could include the initial Blueprint and WP settings but do so as historical information rather than a representation of current site state and config.

Starting with a simple flat format sounds good to me.

My initial idea behind including blueprints was that they already could include all settings data so we wouldn't duplicate it. Browser and device storage don't need the blueprint after the first run, but temporary storage will need it every time it loads.

bgrgicak commented 1 month ago

Random thought: For browser and device storage, it would be nice to access the URL from the blueprint so that it goes to that page when the site opens. But we couldn't use it today because the new UI doesn't allow users to edit blueprints, so it's better if we open / then open a blueprint URL every time. Editing blueprints and figuring out how they are applied to browser and device storage could be a good thing to work on in the future.

bgrgicak commented 1 month ago

What if we reused the query API format for this? All settings are in that format, it supports blueprints and we need all that data to reconstruct a site.

brandonpayton commented 1 month ago

@bgrgicak thanks for all your thoughts!

My initial idea behind including blueprints was that they already could include all settings data so we wouldn't duplicate it. Browser and device storage don't need the blueprint after the first run, but temporary storage will need it every time it loads.

By "temporary storage will need it every time it loads", do you mean that we naturally have to start with a Blueprint every time we create a temporary site?

Random thought: For browser and device storage, it would be nice to access the URL from the blueprint so that it goes to that page when the site opens. But we couldn't use it today because the new UI doesn't allow users to edit blueprints, so it's better if we open / then open a blueprint URL every time. Editing blueprints and figuring out how they are applied to browser and device storage could be a good thing to work on in the future.

I wonder if this is naturally heading in the direction where some things belong to site metadata and other things belong to Blueprints. There can be some overlap. Blueprints currently describe initial platform decisions and configuration and setup preferences, but once a persistent site is initialized, some of those things belong to site metadata which may be changed.

So we render an initial site with a Blueprint and then maintain separate site metadata after that.

I guess an alternative might be to just use a Blueprint as site metadata and always update it as changes are made to the site configuration. Intuitively, I'm uncomfortable with that because I think it might be conflating things that should be separate concepts, but it's something to sleep on.

What if we reused the query API format for this? All settings are in that format, it supports blueprints and we need all that data to reconstruct a site.

This is an interesting thought! My first reaction is that the query API is a kind of user interface and offers different, mutually exclusive options that may not translate well to making a clearly defined data format. But I could be mistaken. This seems like a good one to sleep on as well.

brandonpayton commented 1 month ago

Today, I started a rough draft of site storage APIs and redux plumbing for working with stored sites, #1679.

It's basic and incomplete. It only considers OPFS, and we'll eventually need to support Local FS sites and remote site sources. But it's something to work with and sculpt into something better.

adamziel commented 1 month ago

Each source would define its own Load/Save operations, and since both Load and Save operations can be slow, the interface would need to support progress updates that can be reflected to users.

@brandonpayton good thinking. Also, some sources might only save a diff, a zip, or require streaming data in a re-entrant way for a few days. Also, we'll want to eventually support the same data sources for downloading plugins, themes etc. in the PHP Blueprints library, likely using the WIP StreamChain API. We might eventually have a PHP<->JS stream interop layer. Let's not actually implement any of that today, but let's keep it in mind for the interface design.

For Load, each source query would yield an event stream. We could read each source's stream individually or optionally compose them into a single loading event stream.

@dmsnell Thinking about WordPress core, that's a nice use-case the for the StreamChain API.

If using the Streams API, each load event might look something like:

Does done: boolean stand for the last SiteListingEvent in the stream? If so, it looks a lot like generator.next() data structure that's described by the TypeScript's IteratorYieldResult type:

interface IteratorYieldResult<TYield> {
  done?: false;
  value: TYield;
}

We could lean on that and immediately make it interoperable with generators and iterators:

type SiteListingSource = Iterator<{
    source: sourceSlug,
    // Site schema TBD
    sites: Site[];
    // This is the projected total site count
    // and could be used as the basis for progress UI.
    // This number could change as a source is being read
    // and more sites are being discovered.
    expectedTotal: number;
}>;
const listingSource = /**/ as SiteListingSource;
iterator.next(); // done, value
for(const listing of listingSource) {
    console.log( listing.sites );
}

adamziel commented 1 month ago

My initial thought is that site state is site state and is separate from the idea of Blueprints.

+1, Blueprints are just the initial site recipes. An incomplete site-to-Blueprint export is possible, but Blueprints are still a fundamentally separate concept. Like Dockerfile and a docker image.

It would be great if we can explore some of that space as part of this effort, but for the first iteration of the site metadata format, let's start with something flat and simple.

+1

This seems like a good idea. Providing Playground site information with an export opens them to unforeseen possibilities. Is there a downside?

As long as that format is designed with interop in mind and is not super specific to in-browser Playground, I don't see any downsides today.

For now, let's sculpt site metadata that makes sense for Playground and then consider how it might be useful outside of Playground.

Good call and much agreed. Let me also CC @wojtekn and @sejas.

An in-memory site wouldn't be retrieved from anywhere but would be treated as part of the current site list.

There could be a TemporaryListingSource to keep it all within a single interface and reduce special casing.

A template just seems like a different category of persisted site. Is there something that makes them fundamentally different?

A template could be either a ZIP snapshot or a Blueprint.

Starting a new site from the template would mean cloning the template and creating another site from that initial state. Working with that site would not alter the site template. However, you could explicitly choose to save the current site state as a ZIP snapshot template.

For browser and device storage, it would be nice to access the URL from the blueprint so that it goes to that page when the site opens.

Ideally I'd like to use the slug instead of the scope so that you could have stable Playground URLs, e.g. https://playground.wordpress.net/my-site-23/wp-admin/

bgrgicak commented 1 month ago

By "temporary storage will need it every time it loads", do you mean that we naturally have to start with a Blueprint every time we create a temporary site?

Yes, this is the same as today, except that the blueprint is stored in the URL and not in site storage.

So we render an initial site with a Blueprint and then maintain separate site metadata after that.

I like this. We would still need to keep the blueprint to support the reset site feature and temporary sites, but the blueprint could be immutable.

My first reaction is that the query API is a kind of user interface and offers different, mutually exclusive options that may not translate well to making a clearly defined data format. But I could be mistaken.

If you look at the features of the query API and the new UI they are mostly the same.

WordPress / wordpress-playground

Site storage #1659