Open at055612 opened 7 years ago
As content packs will be the most diverse from an contribution standpoint, can we get some documentation at some point on how and what a contributor should do?
I' guessing all this should contained with (or a link from) the stroom-content/README.md file.
This change may not be needed as I have now added links to the current releases for each pack to the root readme, saving people from having to trawl through the releases page to find the latest release of a pack.
After a chat with @stroomdev66 we agreed that removing the current mandrolic process of manually running the gradle build when we have tagged a singe pack at a new version, then manually creating a release for that tag in github and manually adding the build zips for the pack into it, is prone to error.
It should be possible for a travis build to detect that it is a tagged commit, extract the pack key from the git tag (assuming we follow a convention like pack-name-vx.y.z), then run the gradle build and finally release the zip(s) for that pack to github.
Discussed the fact that a tag applies to the whole repo, but really applies to a single pack. While a bit odd, it still ensures we can point to the source for a version of a pack.
Discussed having a single manifest file in the root of the repo (probably json) that defines all the packs, their versions, the download urls for the versions and compatibility with stroom versions. e.g.
{
packs: [
{
name: stroom-101,
description: some wordy stuff,
versions: [
{
version: v2.0.0,
releaseDate: 20180228,
compatibleStroomVersions: [ v6.0 ],
"zipUrl": "https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0.zip",
zipWithDepsUrl: https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0-all.zip
},
{
version: v1.0.0,
releaseDate: 20180228,
compatibleStroomVersions: [ v5.0, v5.1 ],
zipUrl: https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0.zip,
zipWithDepsUrl: https://github.com/gchq/stroom-content/releases/download/stroom-101-v1.0/stroom-101-v1.0-all.zip
}
]
}
]
}
Maybe this file could also include dependency information for each pack version, eg. pack X v1.2 depends on pack Y v3.4.
With a bit of static javascript and github pages we could easily render this into something more readable for the web.
The alternative would be for each pack to define its own manifest file and then have a process that collates them together into one big easily queryable file.
With a manifest file like this stroom could be changed so that when provided with a link to the manifest file (i.e. hosted on raw.github.com for an appropriate branch) it could then present the user with a list of packs to get.
We need to branch the repo as stroom-v5.0
and stroom-v5.1
to give us the ability to support older pack versions. Currently all packs on master (with the exception of the latest internal stat packs) are v5.x compatible
Further to the above, the thinking now is to move to one repo per pack. The following was posted elsewhere:
Been doing some more thinking about the development of content packs going forward and have discussed with @stroomdev66 and @gcdev373 . The current idea is for each content pack to have its own repo on github. A pack would be per log producing system/app, e.g. squid, apache etc. and would typically handle one log format. Some things like windows produce multiple log formats so for these it would make sense to bundle them into one pack as you are likely to always want all formats for a system. Each pack repo could be owned/maintained by anyone, e.g. @burnalting could create the squid pack in his org's github. This reduces the need for us to be a blocker on everything and the people with the most interest in a pack are responsible for maintaining it. It would contain all the stroom content along with docs and any supporting scripts/config/etc. It would have its own lifecycle and would be tagged/released as changes are made to it. Each pack repo would need the means (i.e. scripts, github actions) to validate and package up a pack so these would be maintained in a single repo probably owned by us that each pack repo could make use of via git sub-modules. The current approach for pulling in dependency packs would need to change so that instead it fetched released pack zips from github releases and used the content from them. Each pack repo would need to conform to some defined structure so the scripts would work on any repo. We would continue the practice of releasing a zip with no deps along with a fat zip with all deps in it. It would also make sense for packs to include some meta file that defines their name, version, all the deps they have to other packs, import format version (i.e. v5, v6, v7, etc.) and maybe some description. e.g.
./meta.yml
./CHANGELOG.md
./README.md # root readme describing the pack
./content # all the stroom content files (xslts, pipes, etc.)
./clientArtefacts # any supporting scripts/config for doing the logging
./docs # any docs that don't fit in the root readme
./framework # git submodule link to the central repo that contains the pack build scripts
We would then maintain a central directory of packs in some (probably gchq) repo which would have links to all the pack repos the people have created along with released versions, compatibility matrices and such like. If this directory was held in some structured form, e.g. yaml, then it could in theory be read by stroom to pull in packs in a more friendly way. There are still some un-answered questions around dependency conflicts and resolving them that I think can only be answered by stroom having an understanding of what a pack is and the deps between them. This is a much bigger problem that is not going to get fixed in the short term though.
The meta.yml could look like
---
id: gchq/stroom-content-101/v1.2.3 #Not sure we need this if we have version and repo
repo: gchq/stroom-content-101
name: Stroom 101
description: some wordy stuff
version: v1.2.3
releaseDate: 20180228
compatibleStroomVersions:
- v6.0
- v6.1
- v7.0
dependencies:
- gchq/stroom-content-standard-pipelines/v0.4
- gchq/stroom-content-template-pipelines/v0.3
# Maybe include the urls for the pack release zips in case we want to support non github hosted release artefacts.
If each pack is identified by
---
packs:
- gchq/stroom-content-standard-pipelines:
- v0.3
- v0.4
- gchq/stroom-content-template-pipelines:
- v0.2
- v0.3
- otherorg/stroom-content-squid-proxy:
- v1.0
# Maybe for each pack ver include the url of its meta.yml file in case we want to support non github repos.
Also, maybe compatibleStroomVersions
ought to just be minimum StroomVersion
?
The specification for a pack repo would be:
stroom-content-
v[0-9]+\.[0-9]+(-(alpha|beta))?\.[0-9]+
meta.yml
conforming to above structure in root of repoCHANGELOG.md
in the root of the repoA further evolution of how this all could work:
stroom-101_v1.2.3.yml
---
uuid: 19e3fab7-3929-4c6e-bbdf-7944965715e4 # A uuid for the pack, used as entity uuid in stroom, maybe?
repo: stroom-content # Should the pack know what repo it is in?
name: stroom-101 # enforce name pattern for pack names, e.g. [-a-zA-Z]+
version: v1.2.3 # pack version
description: some wordy stuff # This could become the Description tab of the pack in stroom (in markdown)
releaseDate: 2023-08-25T13:04:01+01:00
checksum: "addf120b430021c36c232c99ef8d926aea2acd6b" # Hash of all files in the pack (except this yaml file)
minimumStroomVersion: v7.2
packFormatVersion: v1.0 # Version of the structure of the pack and this yaml, so it can be parsed/imported appropriately
files: # relative to pack manifest file, so stroom knows where to download files from
- STROOM_101.Pipeline.38a86873-1365-4173-bb7d-1e41eaca72a8.data.xml
- STROOM_101.Pipeline.38a86873-1365-4173-bb7d-1e41eaca72a8.node
- STROOM_101.Pipeline.38a86873-1365-4173-bb7d-1e41eaca72a8.xml
- etc.
dependencies:
- repo: "stroom-content" # A unique name for a repo. Stroom would need to have this dep repo configured
name: "template-pipelines"
version: v0.3
- repo: "burns-content" # Another repo
name: "squid-proxy"
version: v0.2.1
content-pack-repo.yml
This file could be generated by crawling a directory containing packs
---
uuid: 19e3fab7-3929-4c6e-bbdf-7944965715e4 # A uuid for the repo, maybe
name: "stroom-content" # Unique across all repos
description: some wordy stuff # Description of the repo, would be displayed in stroom next to the repo (in markdown)
repoFormatVersion: v1.0 # Version of the structure of the repo and this yaml, so it can be parsed/imported appropriately
packs:
- name: "stroom-101"
version: v1.2.3
location: "stroom-101/v1.2.3/stroom-101_v1.2.3.yml" # rel path to the pack manifest
checksum: "addf120b430021c36c232c99ef8d926aea2acd6b" # The pack's hash
- name: "template-pipelines"
version: v0.3
location: "stroom-content/v0.3/stroom-content_v0.3yml"
checksum: "f572d396fae9206628714fb2ce00f72e94f2258f" # The pack's hash
Repos, packs and all their content are special things in the explorer tree, under their own special root Content Packs
and distinct from user created content in System
.
Alternatively they could be displayed on their own screen, but it is probably easier to have them in one place for the user.
+ Favourites
+ System
+ Content Packs
+ stroom-content # A content repo
+ stroom-101 # A pack in a repo
+ template-pipelines
- Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.data.xml
- Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.node
- Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.xml
- etc.
+ burns-content # Another content repo
+ squid-proxy # A pack in this repo
All content in a pack would be read-only unless the pack has been set to writable.
UI has:
https://raw.githubusercontent.com/gchq/stroom-content/master/content-pack-repo.yml
URL could be on a http server or a shared file system, e.g. file://shared-storage/stroom-content-repo/content-pack-repo.yml
Rest of repo info obtained from that file.
Displays a list of all packs (and their versions) in the repo on its entity page.
Each one has a button to import the pack.
Importing a version of a pack where another version of that pack is already imported will prompt the user to confirm overwriting the existing pack.
Multiple repos can be added in stroom, but can't add two with the same name.
A pack repo is like a special kind of folder in the exp tree.
You can only add/remove children by importing/removing packs in the repo's entity page or via content menu.
Importing a pack with dependencies would require stroom to have already loaded the repo(s) for the dependency packs, and it would prompt the user to confirm import of the dependency pack(s), which may in turn overwrite existing versions. Removing a pack that is used by another installed pack or where its content is referenced by non-pack content would prompt the user with a warning.
Read-only by default. Importing a pack from a repo creates the pack entity as a child of the repo entity in the tree. ALL files in the pack are descendants of the pack entity in the tree. A pack may contain folders to sub-divide its content.
UI has:
https://raw.githubusercontent.com/gchq/stroom-content/master/content-pack-repo.yml
Editing of packs is only there to allow the development of packs. Content in a pack can ONLY depend on any entities that in a versioned pack that is included in its dependencies and installed in stroom. This is to ensure you cannot publish a pack that has broken dependencies.
If you need to set properties on a pipeline belonging to a pack (e.g. to set an output feed that is different to the input) then create a non-pack pipeline that extends the pack one and edit that.
Stroom might need a new column on the doc table to hold the pack uuid, which can be used to determine the read-only state of pack entities.
/
/content-pack-repo.yml # repo manifest file
/stroom-101
/v1.2.3
...
/template-pipelines
/v0.2
/template-pipelines_v0.2.yml # pack manifest file
/content
/Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.data.xml
/Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.node
/Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.xml
/etc.
/clientArtefacts # Dir for any non-content files, e.g. scripts
/v0.3
/template-pipelines_v0.3.yml # pack manifest file
/content
/Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.data.xml
/Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.node
/Indexing.Pipeline.fcef1b20-083e-436c-ab95-47a6ce453435.xml
/etc.
/clientArtefacts # Dir for any non-content files, e.g. scripts
Doesn't use any git versioning for simplicity. This means the pack repo could be a git repo, a simple http server or a shared file server.
Another option is to have one git repo per pack and use git tagging/branching to manage versioning/dependencies. A repo would likely then contain all the content for a single system, e.g. for processing apache httpd events. When we have to make the pack compatible with a new version of stroom, i.e. a change of structure, it probably makes sense to have a whole new repo for the different stroom version.
Stroom would have a new doc type for a git repo that would behave a bit like a Folder in the tree. You can add the GitRepo doc anywhere in the tree and move it around. If the GitRepo needs to be writable then it would need to be associated with a GitCredentials doc that contains the credentials needed to access that and any similar repo, e.g. creds to access GitHub.
The repo will contain a directory structure to optionally organise the docs within it so they don't have to be flat. The dir structure is essentially relative to the location of the GitRepo doc in the explorer tree. With a read/write GitRepo doc, the user is able to make changes within the GitRepo sub-tree, but not move any of its content outside of the sub-tree. The GitRepo is essentially a walled garden. Content from outside can be copied in or new documents created and these will become part of the GitRepo doc and ultimately pushed to git.
Stroom would convert explorer tree names to a redacted format [a-z0-9_-]+ for use as directory names within the git repo. the folder.node file would contain the metadata to determine the display name in the explorer tree.
This is how a git repo would look
/ # git repo root
manifest.yml # pack manifest file
content/
folder.pipelines.6c968e49-6216-46a4-b3b4-dd026bef4da6/ # A dir "Pipelines"
folder.node/ # dir metadata, e.g. display name
index.indexing_pipeline.fcef1b20-083e-436c-ab95-47a6ce453435/ # A doc "Indexing Pipeline"
doc.data.xml
doc.node
doc.md # Document documentation (previously doc description)
doc.node # doc metadata, e.g. display name
a_sub_dir/
some_file.css
another_file.css
folder.translations.61b047f0-7019-41e0-a3d0-8018f14aac47/ # A dir "Translations"
folder.node
xslt.my_xslt.e6ee26a0-e4c4-4553-bf9f-432052c10712/ # A doc "My XSLT"
doc.node
doc.data.xsl
doc.md # Document documentation (previously doc description)
etc.
clientArtefacts/ # Dir for any non-content files, e.g. scripts
This is an example of how two GitRepo docs would look in stroom
+ Favourites
+ System
+ A Folder
+ Content Packs
@ Template Pipelines # A GitRepo doc
+ Pipelines
+ Indexing Pipeline
+ Translations
+ My XSLT
@ Squid Proxy
+ Squid Proxy Pipeline
+ Squid Proxy XSLT
+ Another Folder
Currently a release on GitHub represents a new version of a single content pack. Find the release you want will quickly become a nightmare as more packs are added and existing packs get updated to new versions.
A better approach maybe to get travis to build all packs whenever a tag is created, and have travis add all content pack zips to that release whether they have changed since the last release or not. If we implement some form of versioning of packs then this could include all versions of each pack.
The tag for the release could be the name and version of the thing(s) that have changed or some arbitrary version number for the packs as a whole.