This would involve small changes to the package definition (and a dependency on Legolas):
using Legolas: @schema, @version
@schema "package-analyzer.package" Package
@version PackageV1 begin
name::String # name of the package
uuid::UUID # uuid of the package
repo::String # URL of the repository
subdir::String # subdirectory of the package in the repo
reachable::Bool # can the repository be cloned?
docs::Bool # does it have documentation?
runtests::Bool # does it have the test/runtests.jl file?
github_actions::Bool # does it use GitHub Actions?
travis::Bool # does it use Travis CI?
appveyor::Bool # does it use AppVeyor?
cirrus::Bool # does it use Cirrus CI?
circle::Bool # does it use Circle CI?
drone::Bool # does it use Drone CI?
buildkite::Bool # does it use Buildkite?
azure_pipelines::Bool # does it use Azure Pipelines?
gitlab_pipeline::Bool # does it use Gitlab Pipeline?
license_files::Vector{LicenseTableEltype} # a table of all possible license files
licenses_in_project::Vector{String} # any licenses in the `license` key of the Project.toml
lines_of_code::Vector{LoCTableEltype} # table of lines of code
contributors::Vector{ContributionTableElType} # table of contributor data
version::Union{VersionNumber, Nothing} # the version number, if a release was analyzed
tree_hash::String # the tree hash of the code that was analyzed
end
That would define a PackageV1 struct that is pretty much the same as our existing one, with reasonable definitions for hash, isequal, ==, keyword argument constructers, and Arrow serialization.
One advantage is that we could more easily add new fields in a backwards compatible way. If we added foo and declared it as ::Union{Missing,Foo}, then if we deserialized an older PackageV1 table from Arrow, it would populate that field as missing when we Legolas.read such a table. We also could provide a default instead.
Another is that we could communicate schema-breaking changes by declaring a new version of Package, i.e. PackageV2. We could keep the old definition around if we wanted, to be able to continue to deserialize old tables. And Legolas uses Arrow's metadata to record which version is serialized in table so it could redirect it to the correct struct to deserialize to.
I'm not sure there's a huge need for this, but it could be handy. We also could define e.g. LicenseTableEltype as a Legolas row instead of as a particular NamedTuple, which could be used for the same kind of schema migration functionality for those items.
https://github.com/beacon-biosignals/Legolas.jl
This would involve small changes to the package definition (and a dependency on Legolas):
That would define a PackageV1 struct that is pretty much the same as our existing one, with reasonable definitions for
hash
,isequal
,==
, keyword argument constructers, and Arrow serialization.One advantage is that we could more easily add new fields in a backwards compatible way. If we added
foo
and declared it as::Union{Missing,Foo}
, then if we deserialized an olderPackageV1
table from Arrow, it would populate that field asmissing
when weLegolas.read
such a table. We also could provide a default instead.Another is that we could communicate schema-breaking changes by declaring a new version of
Package
, i.e.PackageV2
. We could keep the old definition around if we wanted, to be able to continue to deserialize old tables. And Legolas uses Arrow's metadata to record which version is serialized in table so it could redirect it to the correct struct to deserialize to.I'm not sure there's a huge need for this, but it could be handy. We also could define e.g.
LicenseTableEltype
as a Legolas row instead of as a particularNamedTuple
, which could be used for the same kind of schema migration functionality for those items.