JuliaEcosystem / PackageAnalyzer.jl

https://juliaecosystem.github.io/PackageAnalyzer.jl/dev/
MIT License
59 stars 5 forks source link

ArgumentError: Invalid value set for field `licenses_in_project` when processing `Phylo`, `Diversity` and `EcoSISTEM` #103

Closed tylerjthomas9 closed 1 month ago

tylerjthomas9 commented 1 month ago

I am getting an error parsing licenses when analyzing a few packages. Phylo, Diversity and EcoSISTEM are all giving me this error. They all recently updated their LICENSE (https://github.com/EcoJulia/Phylo.jl/commit/1790ae4986698288eee1d35db7d7ad3963ffcb91, https://github.com/EcoJulia/Diversity.jl/commit/141e7b94966daff527b716b16459e8b30c0ed6dd, https://github.com/EcoJulia/EcoSISTEM.jl/commit/5cef1bb8d4077d90c5b12a8342087e427982581d#diff-c693279643b8cd5d248172d9c22cb7cf4ed163a3c98c8a3f69c2717edd3eacb7)

 julia> using PackageAnalyzer

julia> p = analyze("EcoSISTEM")
ERROR: ArgumentError: Invalid value set for field `licenses_in_project`, expected Vector{String}, got a value of type Dict{String, Any} (Dict{String, Any}("SPDX" => "LGPL-3.0-or-later"))
Stacktrace:
 [1] PackageV1(; name::String, uuid::Base.UUID, repo::String, subdir::String, reachable::Bool, docs::Bool, runtests::Bool, github_actions::Bool, travis::Bool, appveyor::Bool, cirrus::Bool, circle::Bool, drone::Bool, buildkite::Bool, azure_pipelines::Bool, gitlab_pipeline::Bool, license_files::Vector{…}, licenses_in_project::Dict{…}, lines_of_code::Vector{…}, contributors::Vector{…}, version::VersionNumber, tree_hash::String)
   @ PackageAnalyzer ~/.julia/packages/Legolas/7Ecrl/src/schemas.jl:547
 [2] PackageV1(name::String, uuid::Base.UUID, repo::String; subdir::String, reachable::Bool, docs::Bool, runtests::Bool, github_actions::Bool, travis::Bool, appveyor::Bool, cirrus::Bool, circle::Bool, drone::Bool, buildkite::Bool, azure_pipelines::Bool, gitlab_pipeline::Bool, license_files::Vector{…}, licenses_in_project::Dict{…}, lines_of_code::Vector{…}, contributors::Vector{…}, version::VersionNumber, tree_hash::String)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/PackageAnalyzer.jl:159
 [3] analyze_code(dir::String; repo::String, reachable::Bool, subdir::String, auth::GitHub.OAuth2, sleep::Int64, only_subdir::Bool, version::VersionNumber)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:285
 [4] analyze_code
   @ ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:221 [inlined]
 [5] analyze(pkg::PackageAnalyzer.Release; root::String, auth::GitHub.OAuth2, sleep::Int64)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:163
 [6] analyze
   @ ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:155 [inlined]
 [7] analyze(name_or_dir_or_url::String; registries::Vector{RegistryInstances.RegistryInstance}, auth::GitHub.OAuth2, sleep::Int64, version::Nothing, root::String, subdir::String)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:67
 [8] analyze(name_or_dir_or_url::String)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:58
 [9] top-level scope
   @ REPL[2]:1

caused by: MethodError: Cannot `convert` an object of type Dict{String, Any} to an object of type Vector{String}

Closest candidates are:
  convert(::Type{Vector{String}}, ::LibGit2.StrArrayStruct)
   @ LibGit2 ~/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/share/julia/stdlib/v1.10/LibGit2/src/strarray.jl:13
  convert(::Type{Array{S, N}}, ::PooledArrays.PooledArray{T, R, N}) where {S, T, R, N}
   @ PooledArrays ~/.julia/packages/PooledArrays/Vy2X0/src/PooledArrays.jl:499
  convert(::Type{T}, ::AbstractArray) where T<:Array
   @ Base array.jl:665
  ...

Stacktrace:
 [1] PackageV1(; name::String, uuid::Base.UUID, repo::String, subdir::String, reachable::Bool, docs::Bool, runtests::Bool, github_actions::Bool, travis::Bool, appveyor::Bool, cirrus::Bool, circle::Bool, drone::Bool, buildkite::Bool, azure_pipelines::Bool, gitlab_pipeline::Bool, license_files::Vector{…}, licenses_in_project::Dict{…}, lines_of_code::Vector{…}, contributors::Vector{…}, version::VersionNumber, tree_hash::String)
   @ PackageAnalyzer ~/.julia/packages/Legolas/7Ecrl/src/schemas.jl:581
 [2] PackageV1(name::String, uuid::Base.UUID, repo::String; subdir::String, reachable::Bool, docs::Bool, runtests::Bool, github_actions::Bool, travis::Bool, appveyor::Bool, cirrus::Bool, circle::Bool, drone::Bool, buildkite::Bool, azure_pipelines::Bool, gitlab_pipeline::Bool, license_files::Vector{…}, licenses_in_project::Dict{…}, lines_of_code::Vector{…}, contributors::Vector{…}, version::VersionNumber, tree_hash::String)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/PackageAnalyzer.jl:159
 [3] analyze_code(dir::String; repo::String, reachable::Bool, subdir::String, auth::GitHub.OAuth2, sleep::Int64, only_subdir::Bool, version::VersionNumber)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:285
 [4] analyze_code
   @ ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:221 [inlined]
 [5] analyze(pkg::PackageAnalyzer.Release; root::String, auth::GitHub.OAuth2, sleep::Int64)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:163
 [6] analyze
   @ ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:155 [inlined]
 [7] analyze(name_or_dir_or_url::String; registries::Vector{RegistryInstances.RegistryInstance}, auth::GitHub.OAuth2, sleep::Int64, version::Nothing, root::String, subdir::String)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:67
 [8] analyze(name_or_dir_or_url::String)
   @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/analyze.jl:58
 [9] top-level scope
   @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.

(JuliaCode) pkg> st PackageAnalyzer
Status `./JuliaCode/Project.toml`
  [e713c705] PackageAnalyzer v3.1.0
ericphanson commented 1 month ago

I guess we expected packages to do license = "BSD-2-Clause" instead of

[license]
SPDX = "BSD-2-Clause"

Either way, we shouldn't crash on this but rather emit a debug message and mark the license in project as missing. We could also try to add support for this format.

giordano commented 1 month ago

@richardreeve is there a particular reason why you went with the

[license]
SPDX = "BSD-2-Clause"

format?

richardreeve commented 1 month ago

I had no idea there was a Julia standard for this - where did you get it from? I just wanted to add some additional metadata into Project.toml, so I added in license and author_details blocks.

I chose SPDX = "..." because I wanted to be explicit that this was the SPDX identifier rather than some other way of specifying the license.

All of this is driven by the ResearchSoftwareMetadata.jl package (which I am just registering) that does a crosswalk from a package's Project.toml to it's codemeta.json, .zenodo.json, LICENSE and the licensing headers in all of the Julia files. The idea is to provide a simple way of implementing the Research Software MetaData (RSMD) guidelines.

richardreeve commented 1 month ago

I'm not objecting to changing what I'm doing here btw if there's a good reason, but I think that specifically referencing SPDX as the format isn't a terrible idea because it is a specific, standard format. Do many packages have a license entry in their Project.toml file?

Also, is adding random stuff into a Project.toml file generally supported, because I'm finding it really handy?

BTW, RSMD has a standard codemeta.json file in the package for any language (e.g. here in Phylo.jl) which contains a lot of the information you seem to be looking for.

https://github.com/EcoJulia/Phylo.jl/blob/dev/test/clean_ResearchSoftwareMetadata.jl

ericphanson commented 1 month ago

I'm not sure there is a real standard for julia. I only started checking for it in this package since I had recently added the license check to General and the idea came up there to use the field: https://github.com/JuliaRegistries/RegistryCI.jl/pull/344.

I found some stdlibs use the short approach: https://github.com/JuliaLang/Pkg.jl/blob/e6880bc9d8a04d95df6e341c76786219a4efc33f/Project.toml#L4.

Additionally, cargo also uses the same approach: https://doc.rust-lang.org/cargo/reference/manifest.html#the-license-and-license-file-fields. Generally they make good choices so it's probably a reasonably good way to do it.

ericphanson commented 1 month ago

BTW, RSMD has a standard codemeta.json file in the package for any language (e.g. here in Phylo.jl) which contains a lot of the information you seem to be looking for.

That is nice, but we wanted to assess the entire registry, so here we try to make do with what we can scrape as opposed to requiring an opt-in approach.

richardreeve commented 1 month ago

Okay, I'll switch over to license="<SPDX-identifier>" tomorrow. I think with a nonstandard field, having a specific expectation of how it should work is probably unreasonable though in general.

The point about the codemeta.json file though is that this information is automatically extracted from Project.toml and elsewhere when you run the crosswalk, so it might make sense to think about these packages in the same context as trying to achieve similar goals...

richardreeve commented 1 month ago

Btw I forgot to say that this should all be resolved now for those packages, but it would be interesting to think about how to reconcile PackageAnalyzer more generally with other metadata that people may want to provide, but isn't part of the Julia standard.