GSA / code-gov-web

DEPRECATED 🛑- Federal Source Code policy implementation.
Other
407 stars 108 forks source link

Final Schema Definition? #196

Closed IanLee1521 closed 5 years ago

IanLee1521 commented 7 years ago

Has there been any finalization in the schema (#41)? Currently there are some discrepancies between:

Specifically, there are differences in formats, but also in the fields being used (e.g. govwideReuseproject vs governmentWideReuseProject, projectTags vs tags.

Additionally, I believe there are still some open questions over how to generate a multi organization code.json file in an agency (see: #187). Perhaps the sample code.json file should be updated to be multi-agency / multi-organization? I can mock up a pull request that would demonstrate that (at least based on opinions / discussion over on #187).

Another related concern, which I created a separate issue for (#195) is about project tags, and whether we should curate the list of possible tags versus allowing anything.

These are questions I've had as I develop @llnl/scraper while generating responses for the 120-day deadline.

jasonduley commented 7 years ago

@IanLee1521 thanks for mentioning this. we'd like have the ability to group multiple organizations with a project array of repositories for each

IanLee1521 commented 7 years ago

@jasonduley -- Sure! As I described on https://github.com/presidential-innovation-fellows/code-gov-web/issues/187#issuecomment-263618496, I envision something where the final code.json file is a list of agency + organization + array_of_projects objects:

[{
    "agency": "ABC",
    "organization": "FOO",
    "projects": [],
}, {
    "agency": "ABC",
    "organization": "BAR",
    "projects": [],
}, {
    "agency": "XYZ",
    "organization": "BAZ",
    "projects": [],
}]

That's been what I've been picturing at least. 😄

IanLee1521 commented 7 years ago

FWIW, I went ahead and took a crack at an updated metadata schema based on many of the discussions we've been having: https://github.com/presidential-innovation-fellows/code-gov-web/compare/master...IanLee1521:project-vs-projects

Figured that talking in code (documentation?) would be the easiest way to discuss any updates / changes going forward.

jfredrickson5 commented 7 years ago

+1 for @IanLee1521's proposed schema with an effective way to group by agency/org.

I've been looking at code.json files that several agencies have posted to their websites. There is a mix of various formats in use. Although we've posted our code.json file per the OMB requirement, I'm unsure if we are using the correct format. It would be great if we could have a formal schema so that we can automatically validate our data.

A formal schema with a defined id would also allow for easy migration between revisions of the schema. Some agencies could be using an older schema while others have already migrated to a current one, and Code.gov would be able to support all of them at once.

IanLee1521 commented 7 years ago

Hi @jfredrickson5, good points, I just updated the branch / pull request (#200) to convert list -> array and integer -> number which appear to be the official primitive types in the formal schema .

One side thought I have. If we are going to move in the direction of something more formal, would it make sense to convert the openSourceProject and governmentWideReuseProject fields from numbers to booleans? To me that would make more sense to someone stumbling upon the project out of the blue.

jasonduley commented 7 years ago

we are hoping to make the attribute "governmentWideReuseProject" an enum instead of a boolean. For NASA we have the following states:

the first two are available now with the boolean value, but as we add projects into our code.json and those projects have yet to enter our software release process, we'd need the 3rd option.

IanLee1521 commented 7 years ago

Nod, that seems reasonable to me.

Another possible one I heard, talking with some folks at LLNL recently, was the idea of an "inventoried but release pending publication" or something similar, to support scientists that create code as part of preparation for a research paper submission, which has not yet been submitted / accepted / published.

This is a case that I don't recall having seen anywhere, so might require some further discussion.

jqnatividad commented 7 years ago

The team should consider adding more metadata detailing security certifications of a given version of an OSS project, leveraging nvd.nist.gov and cve.mitre.org.

We're currently deploying CKAN at several govt agencies at the state level, and we find ourselves having to go through expensive security regimen repeatedly, which adds to the cost of the project.

DanielJDufour commented 5 years ago

Thanks for the comments everyone. As this conversation is quite old and we're no longer tracking issues here (we're about to archive this repo), I'm going to close this. Feel free to open up new discussion on the schema at https://github.com/GSA/code-gov/issues . Thanks! :-)