CISecurity / OVALRepo

266 stars 123 forks source link

Version Number Guidelines #41

Open wmunyan opened 9 years ago

wmunyan commented 9 years ago

Next issue -- Version numbers. CIS was tossing around the idea that, for updated components, requiring that contributors actually increment the version numbers in their submissions, i.e. if making a modification to a test with @version="2", their submission will be required to submit their modification including a @version="3" or else the PR will not be merged.

I'd like to get to a point where the submissions are more of a "black and white" process...Either the submission is valid and it can be merged, or its not and it gets rejected.

This requirement can also aid in steering contributors to first synchronize their personal forks prior to creating topic branches and submitting PRs.

GunnarEngelbach commented 9 years ago

I see where you are going, and it makes sense.

But I know Mitre takes the opposite approach and I wonder why.

Matt/Danny, do you guys have a lesson learned here?

On 7/16/2015 8:34 PM, Bill M wrote:

Next issue -- Version numbers. CIS was tossing around the idea that, for updated components, requiring that contributors actually increment the version numbers in their submissions, i.e. if making a modification to a test with @version https://github.com/version="2", their submission will be required to submit their modification including a @version https://github.com/version="3" or else the PR will not be merged.

I'd like to get to a point where the submissions are more of a "black and white" process...Either the submission is valid and it can be merged, or its not and it gets rejected.

This requirement can also aid in steering contributors to first synchronize their personal forks prior to creating topic branches and submitting PRs.

— Reply to this email directly or view it on GitHub https://github.com/CISecurity/OVALRepo/issues/41.

solind commented 9 years ago

I believe this was one of the motivations behind David Ries creating the script that discovers all the upstream impacted entities — so that all the required the versions could be revisioned as part of a PR, and that could be QA'd.

(I consider a definition to be upstream from a test, which is upstream from an object and a state. Others use the term downstream to mean the same thing, but in my mind that’s less intuitive… except that it might make sense to consider the version changes having to flow downstream).

On Jul 16, 2015, at 7:34 PM, Bill M notifications@github.com wrote:

Next issue -- Version numbers. CIS was tossing around the idea that, for updated components, requiring that contributors actually increment the version numbers in their submissions, i.e. if making a modification to a test with @version https://github.com/version="2", their submission will be required to submit their modification including a @version https://github.com/version="3" or else the PR will not be merged.

I'd like to get to a point where the submissions are more of a "black and white" process...Either the submission is valid and it can be merged, or its not and it gets rejected.

This requirement can also aid in steering contributors to first synchronize their personal forks prior to creating topic branches and submitting PRs.

— Reply to this email directly or view it on GitHub https://github.com/CISecurity/OVALRepo/issues/41.

wmunyan commented 9 years ago

I guess that begs the question of whether or not the upstream version numbers need to change due to a downstream change, or vice versa? I would think that if an object's version number changed, the tests which use that object wouldn't have to change. Only if the test actually changed would its version number have to change. That's the point of each of those entities having their own version numbers.

My (possibly misguided) understanding was that the impacted entities script was meant more as a QA for contributors to highlight the possible impacts of changing an entity...more of a sanity check to say "wow I didnt know that this object was referenced by 20 tests - do I really want to change this object"

solind commented 9 years ago

I believe that right now, if an object is rev’d, so is the test, and any definitions pointing to the test, and any definitions pointing to those definitions. And that does make sense, because all of those have in effect changed because of their downstream dependencies.

I think we should probably maintain that versioning policy.

On Jul 16, 2015, at 8:25 PM, Bill M notifications@github.com wrote:

I guess that begs the question of whether or not the upstream version numbers need to change due to a downstream change, or vice versa? I would think that if an object's version number changed, the tests which use that object wouldn't have to change. Only if the test actually changed would its version number have to change. That's the point of each of those entities having their own version numbers.

My (possibly misguided) understanding was that the impacted entities script was meant more as a QA for contributors to highlight the possible impacts of changing an entity...more of a sanity check to say "wow I didnt know that this object was referenced by 20 tests - do I really want to change this object"

— Reply to this email directly or view it on GitHub https://github.com/CISecurity/OVALRepo/issues/41#issuecomment-122143515.

DavidRies commented 9 years ago

Yes, ideally as part of the CIS pre-publish QA/massaging, a script would:

I think the idea behind version the upstream elements is that their implementation has changed. For example, if I change an object than the meaning/implementation of any test that refers to that object has changed (and any definition that refers to the test, etc.).

If we wanted to, we could exclude non-substantive changes (e.g. changes to comments and other metadata) from triggering upstream versioning.

GunnarEngelbach commented 9 years ago

While I understand the logic, I'd rather not reversion everything that touches an element that has been edited, except maybe in special cases.

When en existing element has been edited, it should be for the purpose of fixing a bug, not to change its fundamental function, therefore whatever is using that element has not also undergone a functional change: it still performs it's originally intended function, on does it better.

If the change is to change the functionality, then an existing element shouldn't be edited, a new test should be created.

Also, if the choice is to reversion things that touch an element that has been reversioned, that has a ripple affect, triggering the reversioning of the things they reference/are referenced by, and so on.

GunnarEngelbach commented 9 years ago

Gunnar,

We handle all the versioning for a couple reasons. One of those reasons is that it keeps submitters from doing more work. Remember when we first created this, it was used in part, to drive adoption of OVAL, so making it as simple as possible to submit content was a plus.

Second, by handling all the version changes on our end, we could automate it and ensure it is all done consistently. While many cases are pretty simple, there are some more complicated scenarios. If you modify a Definition that is used as an extend_definition, does the dependent Definition increment its version as well? How far down does that go? It can be confusing to authors, so we tried to abstract that part from them.

Also, should you ever want to change the way you version things (and I think we have in the past), by doing on the repository side, you don’t have to change how many people think and behave, but rather just edit some code.

So, I guess in short, I don’t think I would say you can’t make this change, but I’m not sure I would recommend it either. The way we handle things now is simply, does it validate (against the 3 validators, XML Schema and Schematron validation, along with the Schematron for our Author’s Guide)? If so, increment the required version numbers and write to the database.

Hope that helps.

Thanks Matt

From: Gunnar Engelbach [mailto:gunnar.engelbach@threatguard.com] Sent: Thursday, July 16, 2015 8:39 PM To: CISecurity/OVALRepo; Haynes, Dan; Hansbury, Matt Subject: Re: [OVALRepo] Version Number Guidelines (#41)

I see where you are going, and it makes sense.

But I know Mitre takes the opposite approach and I wonder why.

Matt/Danny, do you guys have a lesson learned here?

On 7/16/2015 8:34 PM, Bill M wrote:

Next issue -- Version numbers. CIS was tossing around the idea that, for updated components, requiring that contributors actually increment the version numbers in their submissions, i.e. if making a modification to a test with @versionhttps://github.com/version="2", their submission will be required to submit their modification including a @versionhttps://github.com/version="3" or else the PR will not be merged.

I'd like to get to a point where the submissions are more of a "black and white" process...Either the submission is valid and it can be merged, or its not and it gets rejected.

This requirement can also aid in steering contributors to first synchronize their personal forks prior to creating topic branches and submitting PRs.

— Reply to this email directly or view it on GitHubhttps://github.com/CISecurity/OVALRepo/issues/41.

wmunyan commented 9 years ago

So Matt's response is fine, but again, these sorts of things ultimately take away from the automate-ability of the github model. If a PR is submitted with the same version number as the existing content, the PR cannot be merged as is, because the files would need to be changed to update the version numbers.

If we ultimately modify the process to require completely acceptable content, the PR can be merged and we can take advantage of the gitub model. If content is not produced correctly, the PR can be rejected. This process can be automated by some back-end processing and validation of the PR and the ultimate merge.

If we continue to use the existing process (as MITRE does things), then a PR becomes nothing more than a glorified e-mail submission. If this process is used, submitters can make PR's using the existing (non-decomposed) definitions files, and serialization can be automated once the files are validated and content massaged.

Are these things we wont know until the new repository is functioning and contributors are actually submitting content?

GunnarEngelbach commented 9 years ago

I don't see any reason why edits can't be made to a submission made via pull request prior to committing it, especially since you are talking about automating the process as much as possible.

The workflow would be this:

Your concern, I think, is to minimize the amount of CIS resources needed in the management of this repository. And I agree that that is a primary concern, with the other two being:

Allowing the vendor-submitted OVALIDs to be the final ones in the repository is fine, with the caveat that they can't be trusted. They will still need to be reviewed against the repository for uniqueness and either changed as part of the submission process or rejected until the submitter can come up with a unique identifier.

BTW, one consequence of using vendor-supplied OVALIDs: The potential number of files in a element directory becomes 1000 *

On 7/17/2015 9:23 AM, Bill M wrote: > So Matt's response is fine, but again, these sorts of things > ultimately take away from the automate-ability of the github model. If > a PR is submitted with the same version number as the existing > content, the PR cannot be merged as is, because the files would need > to be changed to update the version numbers. > > If we ultimately modify the process to require completely acceptable > content, the PR can be merged and we can take advantage of the gitub > model. If content is not produced correctly, the PR can be rejected. > This process can be automated by some back-end processing and > validation of the PR and the ultimate merge. > > If we continue to use the existing process (as MITRE does things), > then a PR becomes nothing more than a glorified e-mail submission. If > this process is used, submitters can make PR's using the existing > (non-decomposed) definitions files, and serialization can be automated > once the files are validated and content massaged. > > Are these things we wont know until the new repository is functioning > and contributors are actually submitting content? > > — > Reply to this email directly or view it on GitHub > https://github.com/CISecurity/OVALRepo/issues/41#issuecomment-122273804.
wmunyan commented 9 years ago

My question is, for this step:

That is a commit to my local clone, a push to my remote fork, and then a PR/merge of the "accepted" content. Does that process leave the contributor's PR out there and un-mergeable? If so, we lose the benefits of github. If that's the process, I'm fine with that. I can certainly try and code around that scenario. As you mention, the idea is to help minimize resources on all sides, so as much automation as can be built the better.

GunnarEngelbach commented 9 years ago

Beats me. And yeah, that makes all the difference.

If the reviewer can't make any changes to their local merge of the pull request without screwing up the original pull request, that means their only option is to reject it until the submission contains all the edits as they will appear in the master repository.

And in that case, what happens when a submitted change requires a change to other items in the repository (like version numbers) that are also affected by the submission?

DavidRies commented 9 years ago

Hi Bill, I feel like we keep going round and round on this issue of how the merge gets closed. I think it works as follows:

Does that make sense? And, does that address your concerns? Or am I missing something?

solind commented 9 years ago

I would add, there’s no need to ‘reject’ the pull request if some edits are required. There could just be some back-and-forth on the PR; all a submitter needs to do to update their pull request is to push the changes to their own fork, and the PR is automatically updated.

It should be possible to have a script that manages the versioning, that submitters could run before they commit their changes to their topic branch and create their PR. Such a script could use git to identify any changed files, identify all the upstream artifacts, and increment all the relevant version numbers.

Then the PR would truly include all the impacted files/artifacts.

On Jul 17, 2015, at 9:50 AM, DavidRies notifications@github.com wrote:

Hi Bill, I feel like we keep going round and round on this issue of how the merge gets closed. I think it works as follows:

you should be doing this final QA/massage on a local clone of the master repo NOT of your own fork you merge the PR on your local clone (THIS is the PR merge! It just isn't public yet on github so it won't close the PR) you do QA/massage/etc. you commit the release-ready repo you push directly to master repo this is when the PR merge becomes public it will appear as one of the previous commits you will see that merge in the commit history this should cause the PR to close Does that make sense? And, does that address your concerns? Or am I missing something?

— Reply to this email directly or view it on GitHub https://github.com/CISecurity/OVALRepo/issues/41#issuecomment-122299327.

wmunyan commented 9 years ago

David, That process is clear. My issue is the fact that, if I am looking at a PR on the master repo via the website, there is a button there to either "close" or "merge". When on the website, and I select "merge" the PR information is incorporated into the master repository and the PR is closed.

Your step 2 above (merging the PR on my local clone) is not the same as clicking the "merge" button. Follow-on steps need to take place in order to incorporate the changes submitted in the PR into the master repository, as you illustrate.

So we are never going to be "merging" the actual PR into the master repository. We will end up duplicating the change in my local clone and pushing the updates.

That distinction is key. Because we're never "merging" the actual PR, the entire PR process is essentially equivalent to a contributor e-mailing the list with attached files.

This distinction also drives the submission process flow. If the process is essentially equivalent of e-mailing the list, there is no need for the scripts to decompose the definitions into their constituent pieces. My serialization code has that optimized already...If content is submitted in constituent pieces and needs to undergo further massaging and validation, I'll just have to put it all back together again anyways.

DavidRies commented 9 years ago

Bill,

I don't think that's right. The local clone of master and master are the same (when in sync). So, doing a local merge, a commit and push has the same effect on the master and PR as doing the merge on github followed by a pull to local and changes and push. In both cases, you end up with a commit history on github with the merged PR and the subsequent release-ready commit.

But, I could be wrong! I am going to create a fork and PR and step through both side of this to confirm.

Will update you in 10 minutes or so...

-David

GunnarEngelbach commented 9 years ago

There's also a matter of what level of review you want to be able to do.

You still need to pull the pull request down to a cloned copy of the archive. Otherwise the only level of review available to you is the file diffs presented on the github website. And that's insufficient for determining the affect of those changes.

Once in a while that might be all right, when you can see that the only thing that changed was something like fixing a typo in a comment. But most of the time I don't think there's any getting around the need to merge the pull request into a clone and use the tools to do a full validation of the change.

DavidRies commented 9 years ago

Yes, the PR should be pulled down and merged into a local clone of the master repo for review. Right?

On Jul 17, 2015, at 10:10 AM, GunnarEngelbach notifications@github.com wrote:

There's also a matter of what level of review you want to be able to do.

You still need to pull the pull request down to a cloned copy of the archive. Otherwise the only level of review available to you is the file diffs presented on the github website. And that's insufficient for determining the affect of those changes.

Once in a while that might be all right, when you can see that the only thing that changed was something like fixing a typo in a comment. But most of the time I don't think there's any getting around the need to merge the pull request into a clone and use the tools to do a full validation of the change.

— Reply to this email directly or view it on GitHub https://github.com/CISecurity/OVALRepo/issues/41#issuecomment-122307025.

-David

David E. Ries Partner Farnam Hall Ventures

farnamhallventures.com http://farnamhallventures.com/

GunnarEngelbach commented 9 years ago

For some inconsquential updates, like fixing a typo in the comment attribute of a test, it should be obvious from the file diff on github that it's a harmless update and simply accept the pull request from there. No need to pull to a local clone (although the review will still need to do that prior to processing any other submissions). In this inconsquential case I don't even think it's necessary to update the version attribute.

But for all other submissions, yes, it will need to be pulled into a clone and the tools used to validate that submission.

And since it is being processed in a local clone we might as well create scripts that automate the rest of the process prior to final acceptance, such as updating versions, adding the status, and, yes, selecting a final OVALID that is known to be unique.

GunnarEngelbach commented 9 years ago

For that matter, schema validation: can you do schema evaluation with a standalone element?

I don't think so. To do that you'll need to de-serialize the submission into an oval_definitions file, add all the missing elements referenced by the submission but not included in it, and then do schema and schematron validation using that.

DavidRies commented 9 years ago

Right. QA Wishlist 1 (https://github.com/CISecurity/OVALRepo/issues/34) is to validate all affected defs. That means running the script that IDs those def, builds them and validates them.

wmunyan commented 9 years ago

Gunnar, That's correct. If a submission simply contains an OVAL _object, for example, wrapping just that object up into an oval_definitions/objects/your_object will still validate. You dont need all the dependencies for it to be schema valid. The only other required element in an oval_definitions file is the generator. -Bill

DavidRies commented 9 years ago

Ok. I just followed this entire process from both ends and it worked like a charm. Bill, do you have a minute to chat on the phone and get on the same page? Either I'm not understanding the issue or I'm not communicating the solution! :) Call me at 917.202.3844.

wmunyan commented 9 years ago

David R, great - I will call you in a bit.

DavidRies commented 9 years ago

This ticket has taken a long walk in the woods, but I don't think we ever answered the initial questions about handling version numbers.

Open Questions:

Comments:

Once we have some consensus on what we want to do here, we will write a script to do the work.

DavidRies commented 9 years ago

We've decided NOT to increment version #'s on upstream elements. Bill will confirm with MITRE that that is consistent with current practice. If it is, Bill will close this ticket!

DavidRies commented 9 years ago

So, it turns out that MITRE does version upstream. Should we do that?

GunnarEngelbach commented 9 years ago

It's what the community is currently accustomed to.

We need to hear back from Mitre on whether or not they really do that, despite the documentation, and if in their experience it's a good policy, but in general I think we should continue doing things they way they currently as as much as possible. The community can elect to make changes in the future.

On 7/23/2015 1:54 PM, DavidRies wrote:

So, it turns out that MITRE does version upstream. Should we do that?

— Reply to this email directly or view it on GitHub https://github.com/CISecurity/OVALRepo/issues/41#issuecomment-124182170.

blakefrantz commented 9 years ago

@wmunyan - after our phone call last week it sounded like we were leaning toward incrementing the version number of all upstream elements. Is that how you currently see it?

Also, a week or so ago you and i tested a workflow involving a PR that needed CIS tweaks and that workflow resulted in the original PR being closed once your tweaks were merged with the master. As I understand it, that's the ideal scenario from a PR closure perspective. Is that workflow workable for CIS? If so, that workflow can become increasingly automated as the QA scripts are implemented.

wmunyan commented 9 years ago

@blakefrantz Upstream elements only, yes. The "find_upstream_ids" method in lib_search does this nicely.

The workflow we went through is workable, yes

GunnarEngelbach commented 9 years ago

Most of that is already in the QA script, which is waiting as a PR for somebody to merge.

There's a few things I wasn't able to get to in it yet, and not all of it has been verified as working correctly.

On 7/27/2015 9:10 PM, Bill M wrote:

@blakefrantz https://github.com/blakefrantz Upstream elements only, yes. The "find_upstream_ids" method in lib_search does this nicely.

The workflow we went through is workable, yes

— Reply to this email directly or view it on GitHub https://github.com/CISecurity/OVALRepo/issues/41#issuecomment-125399072.