MarkBind / markbind

MarkBind is a tool for generating content-heavy websites from source files in Markdown format
https://markbind.org/
MIT License
135 stars 124 forks source link

Support site versioning #1009

Open ang-zeyu opened 4 years ago

ang-zeyu commented 4 years ago

Is your request related to a problem?

Site versioning is key for documentation use, and education websites may want to keep past versions for archival purposes as well.

Describe the solution you'd like

As discussed, we can support some form of markbind archive <version name> command that builds and archives the site in a say, '/archive' directory.

The links in here would then be accessed with ${baseUrl}/archive/<version name>/<old link>.

To add, perhaps to link it easily to the front end we can automatically insert a navigation dropdown versions in the sitenav ( or navbar ) through an option in site.json.

damithc commented 4 years ago

Along the same vein, it's nice if we can support PR previews in ${baseUrl}/pulls/<PR number>/<old link> On a related note, if we can do all this using GitHub actions + gh-pages, we can shed our dependency on Travis and Netlify at the same time.

ang-zeyu commented 3 years ago

A more comprehensive list of requirements for anyone interested:

Additional notes:

tlylt commented 2 years ago

Saw a somewhat related project that might give some ideas/inspirations: mike,- Manage multiple versions of your MkDocs-powered documentation via Git

kaixin-hc commented 2 years ago

I'll be taking a crack at this issue in the coming weeks!

Initial thoughts:

Some points from a verbal discussion with Prof

One concern was that repository size would balloon quickly, as once the version is added it would be saved in .git even if it is later deleted.

A potential workaround would be saving each version in a separate branch, where the name is the version name. Then the deploy command would keep track of the commit hashes where each version was saved, and use that branches files to generate html files to place in gh-pages. To lessen the time for the build command, deploy could be made more intelligent so that versioned files are only replaced if the commit hash has been changed since the last time. An additional feature that would need to be implemented is some command to 'build past version files' in order to produce the past version files in the current local branch.

jonahtanjz commented 2 years ago

Here are some of my thoughts on this.

Would we want to allow editing the past versions?

I think it would be nice to support editing past versions as this would be useful for updating the documentation of past releases (e.g bug fixes).

Should we be saving versions as markdown files, or .html files?

If we were to allow editing of past versions, then I think saving the past versions as .md files will make more sense as it will be easier for the users to edit and be able to use MarkBind's syntax.

To lessen the time for the build command, deploy could be made more intelligent so that versioned files are only replaced if the commit hash has been changed since the last time. An additional feature that would need to be implemented is some command to 'build past version files' in order to produce the past version files in the current local branch.

This will be useful for editing and deploying of past versions.

damithc commented 2 years ago

This could be one possible approach:

What authors need to do to deploy a 'archived' version of a website,

  1. create a branch to represent the version e.g., branch-v-1-2-0
  2. add the branch name and a version name (e.g., v1.2.0) to the site.json, under a list of archivedVersions
  3. run the build/deploy commands as usual --> the version v1.2.0 will be available in a drop-down versions of the deployed website

A possible implementation:

For each branch_name, version_name in the archivedVersions

  1. check if archive/version_name exsits (if it exists, the version has been generated already in a previous build -- no need to generate again.
  2. if it does not exist, switch to branch_name and generate the version of the website into archive/version_name folder
  3. copy over the generated version of the website into a sub folder under the _sites folder for subsequent deployment

But the above is just speculation. There could be other better ways of doing this.

ang-zeyu commented 2 years ago
  • All the files in archive can use a special layout file, which changes maybe the top level theming to make it clear which version it is. There also ought to be a way to change back to the latest version or change to other versions

I think this ought to be handled by the author, being a layout file, and the wide range of messages they might want to show (difficult to construct a one-size fits all message/formatting for even common use cases). E.g.,

Implementation wise we currently don't have "recursive" layouts as well unfortunately, which may be rather difficult to implement and not quite worth it.

a simple frontend versioning component should be provided, that allows readers to navigate to any version from any version (can be done separately)

Would this serve the same purpose? (e.g. a version dropdown)

  • When transferred into this archive, all the internal links need to link to the current version of the files and not the latest version. Perhaps within the <version_name> folder we can change the {{ base_url }} ? Otherwise, all internal links would need to be changed appropriately

Definitely, it is necessary to have differing base urls/routes since the same file/filename is likely present elsewhere.

  • I don't think we need a single meta-data file to track current and later base_urls for this solution, as there would only be a single repository and branch. If the base_url changes later, the older versions will also have a changed base_url

    • I think this is okay as it is unlikely you would need a totally independent older version of a current website.

readers should be able to navigate to newer version from older versions - hence a single source of truth for versions is needed for all versions. (e.g. a versions.json output metadata file at the output (_site) and source folder root)

^

The metadata file isn't for tracking current / later base urls, that's a difficult problem in any case.

Another solution to the above would be to have multiple, constantly updated (for navigating to all versions) metadata files (e.g. could reuse site.json). I suggested a single source as it seems simpler (implementation wise) if we don't want to support the author changing base urls (retrieving every source and updating a "versions" array inside that also contains base url mappings). (do we? 👀)

A very simple example: https://docusaurus.io/docs/versioning#deleting-an-existing-version

Some points from a verbal discussion with Prof

  • Would we want to allow editing the past versions?

the data file structure / design used to store versioning information should be extensible, so we can add more versioning information in the future as needed without breaking past versions

Another issue with this is how to handle updates (especially breaking changes) from MarkBind itself.

Build versions could be kept (and warnings/errors thrown if its not compatible)

  • Should we be saving versions as markdown files, or .html files?

Isn't "saving" the versions as html files somewhere necessary in any case? 👀 (otherwise, there wouldn't be a deployed site)

If the intention is to "rebuild all versions everytime markbind build is called", that introduces a lot of performance issues + compatibility (e.g. MarkBind version changes).

I'll drop my thoughts on .md further down in another comment.

  • Would we need a feature to revert to a previous version (stretch goal)

agreed (stretch goal)

  • How will this work with sub-sites? (can consider for later)

This should come naturally, as sub-sites are only means for content reuse. (only the outer site's pages are deployed - which can, but not necessarily include the inner site's pages).

One concern was that repository size would balloon quickly, as once the version is added it would be saved in .git even if it is later deleted.

I think this is going to be costly in any case =X

A lovely big warning in the user guide should be shown to "only version your documentation only when needed"

https://docusaurus.io/docs/versioning

ang-zeyu commented 2 years ago

A potential workaround would be saving each version in a separate branch, where the name is the version name. Then the deploy command would keep track of the commit hashes where each version was saved, and use that branches files to generate html files to place in gh-pages. To lessen the time for the build command, deploy could be made more intelligent so that versioned files are only replaced if the commit hash has been changed since the last time. An additional feature that would need to be implemented is some command to 'build past version files' in order to produce the past version files in the current local branch.

This could work, though I think for simplicity we could try storing these on the local fs for a start and see how things out.

This could be one possible approach:

What authors need to do to deploy a 'archived' version of a website,

  1. create a branch to represent the version e.g., branch-v-1-2-0
  2. add the branch name and a version name (e.g., v1.2.0) to the site.json, under a list of archivedVersions
  3. run the build/deploy commands as usual --> the version v1.2.0 will be available in a drop-down versions of the deployed website

A possible implementation:

For each branch_name, version_name in the archivedVersions

  1. check if archive/version_name exsits (if it exists, the version has been generated already in a previous build -- no need to generate again.
  2. if it does not exist, switch to branch_name and generate the version of the website into archive/version_name folder
  3. copy over the generated version of the website into a sub folder under the _sites folder for subsequent deployment

But the above is just speculation. There could be other better ways of doing this.

I suggest going for a mvp for a start, and figure if we want any git coupling and "improvements" later on.

Looks something like this perhaps:

An extension to facilitate the case of changing base urls (or even, entirely different domains) if we also want it for a start:

ang-zeyu commented 2 years ago

A potential workaround would be saving each version in a separate branch, where the name is the version name. Then the deploy command would keep track of the commit hashes where each version was saved, and use that branches files to generate html files to place in gh-pages. To lessen the time for the build command, deploy could be made more intelligent so that versioned files are only replaced if the commit hash has been changed since the last time. An additional feature that would need to be implemented is some command to 'build past version files' in order to produce the past version files in the current local branch.

This could be one possible approach:

What authors need to do to deploy a 'archived' version of a website,

  1. create a branch to represent the version e.g., branch-v-1-2-0
  2. add the branch name and a version name (e.g., v1.2.0) to the site.json, under a list of archivedVersions
  3. run the build/deploy commands as usual --> the version v1.2.0 will be available in a drop-down versions of the deployed website

A possible implementation:

For each branch_name, version_name in the archivedVersions

  1. check if archive/version_name exsits (if it exists, the version has been generated already in a previous build -- no need to generate again.
  2. if it does not exist, switch to branch_name and generate the version of the website into archive/version_name folder
  3. copy over the generated version of the website into a sub folder under the _sites folder for subsequent deployment

But the above is just speculation. There could be other better ways of doing this.

If it does turn out to be an issue, to generalize these, we could also support somewhere down the line:

source_url can be either:

ang-zeyu commented 2 years ago

An extension to facilitate the case of changing base urls (across versions)

@damithc @kaixin-hc To raise this question again, as it would affect the initial core implementation and discussion from @kaixin-hc rather heavily, and the (permanant) possibility of future features.

It would enable (some things I can think of currently):

It would ~not~ (or rather, it is fine in any case) enable:

Implementation wise, its the difference between having:


*Retrieving versioning metadata should be only necessary for the versioning component (dropdown/header/etc.) to work, i.e. something like this: 1

If we want to leave the author to manually "link things up" then eitherway should work.

damithc commented 2 years ago

A few more things:

  1. Just so that we don't have have false expectations, I'm unlikely to use the versioning feature for CS2103 website simply because the site is already quite heavy and I don't want to make it heavier. But the feature could be useful for projects such as RepoSense (and MarkBind itself!) so that we can maintain user documentation for different versions of the product.
  2. It is good if we provide the ability to update past versions (e.g., fix a bug in the documentation of a past version of a product). This favors having branches corresponding to each version.
  3. One other factor we need to consider is that MarkBind itself will keep evolving and most new major versions will have breaking changes. When a user is moving to a new major version of MarkBind that has breaking changes, the user might wish to generate and store the HTML snapshot of the past versions of the website as s/he will not be able to generate them anymore in future. Does that make sense?
ang-zeyu commented 2 years ago

A few more things:

  1. Just so that we don't have have false expectations, I'm unlikely to use the versioning feature for CS2103 website simply because the site is already quite heavy and I don't want to make it heavier. But the feature could be useful for projects such as RepoSense (and MarkBind itself!) so that we can maintain user documentation for different versions of the product.

cross origin versioning (e.g. have different github pages from different repos point to each other)

  • (also due to potential cross origin issues with retrieving a single source of truth from a different domain)
  • e.g. use case is the case for 2103 sites across years right now (across different repos)

Would highlight this part again though, it is possible to completely avoid file bloat with this.

Would this be an appealing feature from standpoint of larger education sites? (although, I can't say exactly for sure how usable it'll turn out eventually)

  1. It is good if we provide the ability to update past versions (e.g., fix a bug in the documentation of a past version of a product). This favors having branches corresponding to each version.

Agreed. (I think) This can be done in a separately from the ground work. This is from a reviewer's standpoint however, if you feel that including this from the get-go would makes things easier, please go for it 💪 @kaixin-hc

One other factor we need to consider is that MarkBind itself will keep evolving and most new major versions will have breaking changes. When a user is moving to a new major version of MarkBind that has breaking changes, the user might wish to generate and store the HTML snapshot of the past versions of the website as s/he will not be able to generate them anymore in future. Does that make sense?

yup, https://github.com/MarkBind/markbind/issues/1009#issuecomment-1074241848 -- we should store the build versions in the project folder and abort / warn if the version is incompatible.

@MarkBind/active-devs Feel free to chime in as well. This issue is rather large and the decisions will have retroactive consequences as opposed to most other features, we could use all the 🧠 we can get here.

tlylt commented 2 years ago

Just thought of one point about storing the original .md / source files vs storing the generated HTML files:

damithc commented 2 years ago

If the project is version controlled via git ...

This reminded me that MarkBind does not mandate the use of git. I kind of forgot that fact :-p So, we should not make git a compulsory part of the feature (although optional optimizations for git users should be OK).

tlylt commented 2 years ago

If the project is version controlled via git ...

This reminded me that MarkBind does not mandate the use of git. I kind of forgot that fact :-p So, we should not make git a compulsory part of the feature (although optional optimizations for git users should be OK).

I suppose If there is any kind of version control going on (git or not), the person will have a way to access the previous state of the project (hence the docs). Else, perhaps it could be less meaningful for versioned docs anyway? i.e no one is going to have access to the state of the project at v1 when it's now v2, then not much point for a v1 docs to exist.

Edit: Hmm ok unless people produce a "build" of their project for one version and then just edit the current project files until the next version(without git or its equivalent) .... I guess in this case then there's a problem...

damithc commented 2 years ago

This reminded me that MarkBind does not mandate the use of git. I kind of forgot that fact :-p So, we should not make git a compulsory part of the feature (although optional optimizations for git users should be OK).

I suppose If there is any kind of version control going on (git or not), the person will have a way to access the previous state of the project (hence the docs). Else, perhaps it could be less meaningful for versioned docs anyway? i.e no one is going to have access to the state of the project at v1 when it's now v2, then not much point for a v1 docs to exist.

What I meant is, we should not tie the feature specifically to a git feature. But it is fine to assume there is some version control in use. At the same time, given how prevalent Git is, tying this feature to git (if that makes the feature easier to implement/use) is not out of the question either.

kaixin-hc commented 2 years ago

Thanks for the lively discussion guys! Some responses to thoughts here

It is good if we provide the ability to update past versions (e.g., fix a bug in the documentation of a past version of a product). This favors having branches corresponding to each version. What I meant is, we should not tie the feature specifically to a git feature. But it is fine to assume there is some version control in use. At the same time, given how prevalent Git is, tying this feature to git (if that makes the feature easier to implement/use) is not out of the question either. One other factor we need to consider is that MarkBind itself will keep evolving and most new major versions will have breaking changes. When a user is moving to a new major version of MarkBind that has breaking changes, the user might wish to generate and store the HTML snapshot of the past versions of the website as s/he will not be able to generate them anymore in future.

My original idea was moving the contents of a certain branch every time you archive with markbind archive <versionName> [versionFolder] to a created/existing branch called versionFolder-versionName. Then, from that branch, you can make your changes to the .md files, using the version of markbind you specify in the site.json(?), and run another markbind command markbind updateArchive <branchToUpdateIn> <versionName> [versionFolder] that will move the newly built files back into the original branch. (Might need to switch between versions of markbind when editing old versions - the old one to edit and generate the new site files, the newer/more recent one to move the version in).

I believe there shouldn't be compatibility issues if each branch can be treated as it's own environment which may be run with the version of markbind that was used to build it. The proposed implementation currently entails storing the html snapshot of past versions of the website so that they do not need to be rebuilt.

If we want to leave the author to manually "link things up" then eitherway should work. From my discussion with Prof Damith, this is what he's leaning towards at present. But I agree that it's good to think ahead in case we want to implement some kind of auto-frontend feature in the future.

I'm sorry @ang-zeyu, I don't really understand how to implement the cross-origin versions array. From your description of the pros and cons, it sounds like the cross-origin versions array will be more versatile and extensible (and could probably be turned into a versions.json file quite easily)? But I don't understand how we can use it to version across repositories, so I'm not sure about the complexity of implementation.

ang-zeyu commented 2 years ago

I'm sorry @ang-zeyu, I don't really understand how to implement the cross-origin versions array. From your description of the pros and cons, it sounds like the cross-origin versions array will be more versatile and extensible (and could probably be turned into a versions.json file quite easily)? But I don't understand how we can use it to version across repositories, so I'm not sure about the complexity of implementation.

Another solution to the above would be to have multiple, constantly updated (for navigating to all versions) metadata files (e.g. could reuse site.json). I suggested a single source as it seems simpler (implementation wise) if we don't want to support the author changing base urls (retrieving every source and updating a "versions" array inside that also contains base url mappings). (do we? 👀)

@kaixin-hc I'm thinking its just a natural "side effect / benefit" if we want to head down this path ^

Suppose you have every archived site's frontend component retrieving from a single source of truth (say it was hosted at abc.com/initialbaseurl/versions.json), if abb.com tries to fetch this:

If we instead store separate ones,

abc.com/.../versions.json
abb.com/.../v1/_site/versions.json
abd.com/.../v2/_site/versions.json
...

There would be some work in updating the various versions.jsons in archived sites to ensure the latest updated version -- but no cross-origin issues.

*This is all assuming we don't rebuild any of the archived sites though, which certainly fixes the first case, but introduces compatibility (markbind version change) concerns.

If we want to leave the author to manually "link things up" then eitherway should work. From my discussion with Prof Damith, this is what he's leaning towards at present. But I agree that it's good to think ahead in case we want to implement some kind of auto-frontend feature in the future.

👍

kaixin-hc commented 2 years ago

Thanks for the clarification @ang-zeyu! (And sorry, I think I totally missed this reply 🙈 )

*This is all assuming we don't rebuild any of the archived sites though, which certainly fixes the first case, but introduces compatibility (markbind version change) concerns.

Could you explain more about the compatibility concerns? If we only keep the built archive files, which are in HTML format, would there be MarkBind issues in serving the HTML files...?

There would be some work in updating the various versions.jsons in archived sites to ensure the latest updated version -- but no cross-origin issues.

I see, that sounds quite doable. It's the same idea as updating the changes in the main site when the archived sites are updated, somehow 'pushing' the changes back to the main site and from there to the versioned sites.

I still think that changing the URL of the original site is probably not a common use case though, so I'm not sure if it's worth the work to support. And I think there are probably some edge cases that I'm not thinking of at the moment.

Do you all think its okay if I don't implement this version tracking file into my current PR? Since it works if the author plans to manually link things up, and I think it makes sense for whoever is implementing the front end component to figure out what kind of backend source of truth would be best.

ang-zeyu commented 2 years ago

Thanks for the clarification @ang-zeyu! (And sorry, I think I totally missed this reply 🙈 )

*This is all assuming we don't rebuild any of the archived sites though, which certainly fixes the first case, but introduces compatibility (markbind version change) concerns.

Could you explain more about the compatibility concerns? If we only keep the built archive files, which are in HTML format, would there be MarkBind issues in serving the HTML files...?

Nope, its only if we keep source files for rebuilding. Same point here https://github.com/MarkBind/markbind/issues/1009#issuecomment-1074241848 and here https://github.com/MarkBind/markbind/issues/1009#issuecomment-1086793309

There would be some work in updating the various versions.jsons in archived sites to ensure the latest updated version -- but no cross-origin issues.

I see, that sounds quite doable. It's the same idea as updating the changes in the main site when the archived sites are updated, somehow 'pushing' the changes back to the main site and from there to the versioned sites.

I still think that changing the URL of the original site is probably not a common use case though, so I'm not sure if it's worth the work to support. And I think there are probably some edge cases that I'm not thinking of at the moment.

Agreed its not such a common use case to change the baseurl for a single site.

Multiple sites (cross origin) might be a different story though (the 2103 sites spread across the various repos is one example), especially if we aim to support education sites which may tend to be larger.

Do you all think its okay if I don't implement this version tracking file into my current PR? Since it works if the author plans to manually link things up, and I think it makes sense for whoever is implementing the front end component to figure out what kind of backend source of truth would be best.

Keeping in mind our discussed goals so far (I'm assuming a little from your conversation with prof @damithc too), I'll try to sum up all the things mentioned here:

Now here comes a slightly more tricky part (I believe) which also makes the above features more complex

Ergo my answer would be no (it should be ok). This is just my run through, do have a thorough look through these points and see if they make sense to you guys as well @kaixin-hc @jonahtanjz

If we're the slightest bit unsure, I would suggest we at least implement a versions.json in the project (not _site) (source) folder for now, it can be something extremely simple like:

[
  { version_name: , build_ver: <markbind build version, output: "where is the archived site stored?" }
]

It should be fairly straightforward implementation wise and harmless to the user/author (if we find we don't need this file in the future, we can just put a changelog note to remove it) I believe this should pad the risk in missing something out here significantly but fingers crossed. =P

kaixin-hc commented 2 years ago

Thank you for the summary @ang-zeyu ! Yes, I think that's accurate.

I'll go with the basic versions.json option for now then, I think it's a good balance of considerations (and should also be useful in making sure that archived sites are not re-archived with the archive command, which is the problem I'm wrestling with right now :))