JuliaEcosystem / PackageAnalyzer.jl

https://juliaecosystem.github.io/PackageAnalyzer.jl/dev/
MIT License
58 stars 5 forks source link

Use Legolas.jl to define `Package` type? #83

Closed ericphanson closed 1 year ago

ericphanson commented 1 year ago

https://github.com/beacon-biosignals/Legolas.jl

This would involve small changes to the package definition (and a dependency on Legolas):

using Legolas: @schema, @version
@schema "package-analyzer.package" Package

@version PackageV1 begin
    name::String # name of the package
    uuid::UUID # uuid of the package
    repo::String # URL of the repository
    subdir::String # subdirectory of the package in the repo
    reachable::Bool # can the repository be cloned?
    docs::Bool # does it have documentation?
    runtests::Bool # does it have the test/runtests.jl file?
    github_actions::Bool # does it use GitHub Actions?
    travis::Bool # does it use Travis CI?
    appveyor::Bool # does it use AppVeyor?
    cirrus::Bool # does it use Cirrus CI?
    circle::Bool # does it use Circle CI?
    drone::Bool # does it use Drone CI?
    buildkite::Bool # does it use Buildkite?
    azure_pipelines::Bool # does it use Azure Pipelines?
    gitlab_pipeline::Bool # does it use Gitlab Pipeline?
    license_files::Vector{LicenseTableEltype} # a table of all possible license files
    licenses_in_project::Vector{String} # any licenses in the `license` key of the Project.toml
    lines_of_code::Vector{LoCTableEltype} # table of lines of code
    contributors::Vector{ContributionTableElType} # table of contributor data
    version::Union{VersionNumber, Nothing} # the version number, if a release was analyzed
    tree_hash::String # the tree hash of the code that was analyzed
end

That would define a PackageV1 struct that is pretty much the same as our existing one, with reasonable definitions for hash, isequal, ==, keyword argument constructers, and Arrow serialization.

One advantage is that we could more easily add new fields in a backwards compatible way. If we added foo and declared it as ::Union{Missing,Foo}, then if we deserialized an older PackageV1 table from Arrow, it would populate that field as missing when we Legolas.read such a table. We also could provide a default instead.

Another is that we could communicate schema-breaking changes by declaring a new version of Package, i.e. PackageV2. We could keep the old definition around if we wanted, to be able to continue to deserialize old tables. And Legolas uses Arrow's metadata to record which version is serialized in table so it could redirect it to the correct struct to deserialize to.

I'm not sure there's a huge need for this, but it could be handy. We also could define e.g. LicenseTableEltype as a Legolas row instead of as a particular NamedTuple, which could be used for the same kind of schema migration functionality for those items.

giordano commented 1 year ago

I think we changed several times what's included in the Package data structure, I'm not familiar with this tool but it sounds like it could be useful!