Materials-Data-Science-and-Informatics / somesy

A CLI tool for synchronizing software project metadata
https://materials-data-science-and-informatics.github.io/somesy/main/
MIT License
9 stars 1 forks source link

Allow to pass additional metadata to codemeta that is not or cannot be automatically harvested #51

Open broeder-j opened 10 months ago

broeder-j commented 10 months ago

Describe the bug somesy sync deletes existing ids from affiliations of authors and contributors. it also removes: the '@id' from the resource itself, all keys-value pairs that codemeta allows, but codemeta somesy internal does not.

For me this is a critical bug, because of which I won't use somesy until this is not the case anymore.

Expected behavior

To Reproduce Steps to reproduce the behavior:

  1. have a rich codemeta.json
  2. run somesy, with a complete as possible toml file

(sorry for not condensing to a minimal example). Some of the changes come because of python project metadata. Example codemeta.json in:

{
    "@context": [
        "https://doi.org/10.5063/schema/codemeta-2.0",
        "https://w3id.org/software-iodata",
        "https://raw.githubusercontent.com/jantman/repostatus.org/master/badges/latest/ontology.jsonld",
        "https://schema.org",
        "https://w3id.org/software-types"
    ],
    "@id" : "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting",
    "@type": "SoftwareSourceCode",
    "author": [
        {
            "@id" : "https://orcid.org/0000-0001-7939-226X",
            "@type": "Person",
            "email": "j.broeder@fz-juelich.de",
            "familyName": "Bröder",
            "givenName": "Jens",
            "affiliation": {
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {
        "@id":"https://orcid.org/0000-0002-0070-4337",
            "@type": "Person",
            "email": "a.strupp@fz-juelich.de",
            "familyName": "Strupp",
            "givenName": "Annika",
            "affiliation": {
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {
        "@id": "https://orcid.org/0000-0003-0000-4784",
            "@type": "Person",
            "email": "p.videgain.barranco@fz-juelich.de",
            "familyName": "Videgain Barranco",
            "givenName": "Pedro",
            "affiliation": {
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {
        "@id": "https://orcid.org/0000-0002-2818-5890",
            "@type": "Person",
            "email": "s.fathalla@fz-juelich.de",
            "familyName": "Fathalla",
            "givenName": "Said",
            "affiliation": {
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {
        "@id": "https://orcid.org/0000-0002-3968-2446",
            "@type": "Person",
            "email": "gabriel.preuss@helmholtz-berlin.de",
            "familyName": "Preuß",
            "givenName": "Gabriel",
            "affiliation": {
                "@id": "https://ror.org/02aj13c28",
                "name": "Helmholtz-Zentrum Berlin für Materialien und Energie"
            }
        }
    ],
    "codeRepository": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting.git",
    "contributor": [
        {
            "@id" : "https://orcid.org/0000-0001-7939-226X",
            "@type": "Person",
            "email": "j.broeder@fz-juelich.de",
            "familyName": "Bröder",
            "givenName": "Jens",
            "affiliation": {
                "@type": "Organization",
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {   
        "@id":"https://orcid.org/0000-0002-0070-4337",
            "@type": "Person",
            "email": "a.strupp@fz-juelich.de",
            "familyName": "Strupp",
            "givenName": "Annika",
            "affiliation": {
                "@type": "Organization",
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {
            "@id": "https://orcid.org/0000-0003-0000-4784",
            "@type": "Person",
            "email": "p.videgain.barranco@fz-juelich.de",
            "familyName": "Videgain Barranco",
            "givenName": "Pedro",
            "affiliation": {
                "@type": "Organization",
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {
        "@id": "https://orcid.org/0000-0002-2818-5890",
            "@type": "Person",
            "email": "s.fathalla@fz-juelich.de",
            "familyName": "Fathalla",
            "givenName": "Said",
            "affiliation": {
                "@id": "https://ror.org/02nv7yv05",
                "name": "Forschungszentrum Jülich GmbH"
            }
        },
        {
        "@id": "https://orcid.org/0000-0002-3968-2446",
            "@type": "Person",
            "email": "gabriel.preuss@helmholtz-berlin.de",
            "familyName": "Preuß",
            "givenName": "Gabriel",
            "affiliation": {
                "@id": "https://ror.org/02aj13c28",
                "name": "Helmholtz-Zentrum Berlin für Materialien und Energie"
            }
        }
    ],
    "dateCreated": "2021-07-09",
    "dateModified": "2022-12-19",
    "datePublished": "2023-03-01",
    "developmentStatus": "https://www.repostatus.org/#wip",
    "identifier": "data_harvesting",
    "maintainer": {
        "@id" : "https://orcid.org/0000-0001-7939-226X",
        "@type": "Person",
        "email": "j.broeder@fz-juelich.de",
        "familyName": "Bröder",
        "givenName": "Jens",
        "affiliation": {
            "@type": "Organization",
            "@id": "https://ror.org/02nv7yv05",
            "name": "Forschungszentrum Jülich GmbH"
        }
    },
    "programmingLanguage": [
        "Python",
        "Shell"
    ],
    "readme": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting/-/blob/main/README.md",
    "softwareHelp": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting",
    "license": "http://spdx.org/licenses/MIT",
    "name": "data-harvesting",
    "runtimePlatform": [
        "Python 3",
        "Python 3.7",
        "Python 3.8",
        "Python 3.9",
        "Python 3.10",
        "Python 3.11"
    ],
    "keywords": [
        "unhide", "Helmholtz association", "data mining", "HMC", "metadata", "data publications", 
        "software publication", "RSE", "FAIR", "linked data", "knowledge graph", "json-ld", "schema.org", "restruct"
    ],
    "softwareRequirements": [
        {
            "@type": "SoftwareApplication",
            "identifier": "advertools",
            "name": "advertools",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "beautifulsoup4",
            "name": "beautifulsoup4",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pandas",
            "name": "pandas",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "requests",
            "name": "requests",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "typer",
            "name": "typer",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "rich",
            "name": "rich",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "progressbar2",
            "name": "progressbar2",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "rdflib",
            "name": "rdflib",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "codemetapy",
            "name": "codemetapy",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pyld",
            "name": "pyld",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "extruct",
            "name": "extruct",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pyld",
            "name": "pyld",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "jq",
            "name": "jq",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "jsonschema",
            "name": "jsonschema",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "xmljson",
            "name": "xmljson",
            "runtimePlatform": "Python 3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "codemeta-harvester",
            "name": "codemeta-harvester",
            "runtimePlatform": "Shell"
        }
    ],
    "targetProduct": {
        "@type": "CommandLineApplication",
        "executableName": "hmc_unhide",
        "name": "hmc_unhide",
        "runtimePlatform": ["Shell", "Python 3"]
    },
    "applicationCategory": "Library, Data science",
    "downloadUrl": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting/-/archive/main/data_harvesting-main.zip",
    "fileSize": "57MB",
    "operatingSystem": ["OSX", "Linux"],
    "releaseNotes": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting/-/blob/main/CHANGELOG.md",    
    "url": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting",
    "description": "Set of tools to harvest, process and uplift (meta)data from metadata providers within the Helmholtz association to be included in the Helmholtz Knowledge Graph (Helmholtz-KG). The harvested linked data in the form of schema.org jsonld is aggregated and uplifted in data pipelines to be included into a single large knowledge graph (KG). The tool set and harvesters can be used as a python library or over a commandline interface (CLI, hmc-unhide). Provenance of metadata changes is tracked rudimentary by saving graph patches of changes on rdflib Graph data structures on the semantic triple level. Harvesters support extracting data via sitemap, gitlab API, datacite API and OAI-PMH endpoints.",
    "copyrightHolder": [
        {"@id" : "Helmholtz Metadata Collaboration (HMC)",
             "@type": "Organization",
             "name": "Helmholtz Metadata Collaboration (HMC)"},
         {"@type": "Organization",
         "name": "Forschungszentrum Jülich GmbH (Materials Data Science and Informatics (IAS-9), Institute for Advanced Simulation (IAS)), Jülich, Germany"}],
    "copyrightYear" : "2022",
    "softwareVersion": "1.1.0",
    "isAccessibleForFree": "True",
    "contIntegration": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting/-/pipelines",
    "buildInstructions": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting/-/blob/main/pyproject.toml",
    "issueTracker": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting/-/issues"
}

somesy.toml in

[project]
name = "data-harvesting"
version = "1.1.0"
description = "Set of tools to harvest, process and uplift (meta)data from metadata providers within the Helmholtz association to be included in the Helmholtz Knowledge Graph (Helmholtz-KG). The harvested linked data in the form of schema.org jsonld is aggregated and uplifted in data pipelines to be included into a single large knowledge graph (KG). The tool set and harvesters can be used as a python library or over a commandline interface (CLI, hmc-unhide). Provenance of metadata changes is tracked rudimentary by saving graph patches of changes on rdflib Graph data structures on the semantic triple level. Harvesters support extracting data via sitemap, gitlab API, datacite API and OAI-PMH endpoints."

keywords = [
        "unhide", "Helmholtz association", "data mining", "HMC", "metadata", "data publications",
        "software publication", "RSE", "FAIR", "linked data", "knowledge graph", "json-ld", "schema.org", "restruct"
    ]
license = "MIT"
repository = "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting.git"
homepage = "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting"

[[project.people]]
given-names = "Jens"
family-names = "Bröder"
email = "j.broeder@fz-juelich.de"
orcid = "https://orcid.org/0000-0001-7939-226X"
author = true      # is a full author of the project (i.e. appears in citations)
maintainer = true  # currently maintains the project (i.e. is a contact person)
# publication_author = true

[[project.people]]
given-names = "Annika"
family-names = "Strupp"
email = "a.strupp@fz-juelich.de"
orcid = "https://orcid.org/0000-0002-0090-4337"
author = true

[[project.people]]
orcid = "https://orcid.org/0000-0003-0000-4784"
email = "p.videgain.barranco@fz-juelich.de"
family-names = "Videgain Barranco"
given-names = "Pedro"
affiliation = "Forschungszentrum Jülich GmbH"
author = true
# publication_author = true

[[project.people]]
orcid = "https://orcid.org/0000-0002-2818-5890"
email = "s.fathalla@fz-juelich.de"
family-names = "Fathalla"
given-names = "Said"
affiliation = "Forschungszentrum Jülich GmbH"
author = true
# publication_author = true

[[project.people]]
orcid = "https://orcid.org/0000-0002-3968-2446"
email = "gabriel.preuss@helmholtz-berlin.de"
family-names = "Preuß"
given-names = "Gabriel"
affiliation = "Helmholtz-Zentrum Berlin für Materialien und Energie"
author = true
maintainer = true
# publication_author = true

[config]
verbose = true     # show detailed information about what somesy is doing

'crippled' codemeta.json out.

{
    "@context": [
        "https://doi.org/10.5063/schema/codemeta-2.0",
        "https://w3id.org/software-iodata",
        "https://raw.githubusercontent.com/jantman/repostatus.org/master/badges/latest/ontology.jsonld",
        "https://schema.org",
        "https://w3id.org/software-types"
    ],
    "@type": "SoftwareSourceCode",
    "applicationCategory": [
        "Database",
        "Education",
        "Scientific/Engineering",
        "Scientific/Engineering > Information Analysis",
        "Scientific/Engineering > Visualization",
        "Text Processing"
    ],
    "audience": [
        {
            "@type": "Audience",
            "audienceType": "Science/Research"
        },
        {
            "@type": "Audience",
            "audienceType": "Information Technology"
        }
    ],
    "author": [
        {
            "@id": "https://orcid.org/0000-0001-7939-226X",
            "@type": "Person",
            "familyName": "Bröder",
            "givenName": "Jens"
        },
        {
            "@id": "https://orcid.org/0000-0002-0070-4337",
            "@type": "Person",
            "familyName": "Strupp",
            "givenName": "Annika"
        },
        {
            "@id": "https://orcid.org/0000-0003-0000-4784",
            "@type": "Person",
            "affiliation": {
                "@type": "Organization",
                "legalName": "Forschungszentrum Jülich GmbH"
            },
            "familyName": "Videgain Barranco",
            "givenName": "Pedro"
        },
        {
            "@id": "https://orcid.org/0000-0002-2818-5890",
            "@type": "Person",
            "affiliation": {
                "@type": "Organization",
                "legalName": "Forschungszentrum Jülich GmbH"
            },
            "familyName": "Fathalla",
            "givenName": "Said"
        },
        {
            "@id": "https://orcid.org/0000-0002-3968-2446",
            "@type": "Person",
            "affiliation": {
                "@type": "Organization",
                "legalName": "Helmholtz-Zentrum Berlin für Materialien und Energie"
            },
            "familyName": "Preuß",
            "givenName": "Gabriel"
        }
    ],
    "codeRepository": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting.git",
    "description": "Set of tools to harvest, process and uplift (meta)data from metadata providers within the Helmholtz association to be included in the Helmholtz Knowledge Graph (Helmholtz-KG). The harvested linked data in the form of schema.org jsonld is aggregated and uplifted in data pipelines to be included into a single large knowledge graph (KG). The tool set and harvesters can be used as a python library or over a commandline interface (CLI, hmc-unhide). Provenance of metadata changes is tracked rudimentary by saving graph patches of changes on rdflib Graph data structures on the semantic triple level. Harvesters support extracting data via sitemap, gitlab API, datacite API and OAI-PMH endpoints.",
    "developmentStatus": "https://www.repostatus.org/#wip",
    "identifier": "data-harvesting",
    "keywords": [
        "FAIR",
        "HMC",
        "Helmholtz association",
        "RSE",
        "data mining",
        "data publications",
        "json-ld",
        "knowledge graph",
        "linked data",
        "metadata",
        "restruct",
        "schema.org",
        "software publication",
        "unhide"
    ],
    "license": "https://spdx.org/licenses/MIT",
    "name": "data-harvesting",
    "operatingSystem": [
        "MacOS > MacOS X",
        "POSIX > Linux"
    ],
    "runtimePlatform": [
        "Python",
        "Python 3",
        "Python 3.10",
        "Python 3.11",
        "Python 3.12",
        "Python 3.9"
    ],
    "softwareRequirements": [
        {
            "@type": "SoftwareApplication",
            "identifier": "SPARQLWrapper",
            "name": "SPARQLWrapper",
            "runtimePlatform": "Python 3",
            "version": "^2.0.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "advertools",
            "name": "advertools",
            "runtimePlatform": "Python 3",
            "version": "^0.13.2"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "beautifulsoup4",
            "name": "beautifulsoup4",
            "runtimePlatform": "Python 3",
            "version": "^4.11.1"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "codemetapy",
            "name": "codemetapy",
            "runtimePlatform": "Python 3",
            "version": "^2.3.3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "crontab",
            "name": "crontab",
            "runtimePlatform": "Python 3",
            "version": "^1.0.1"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "extruct",
            "name": "extruct",
            "runtimePlatform": "Python 3",
            "version": "^0.13.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "jq",
            "name": "jq",
            "runtimePlatform": "Python 3",
            "version": "^1.3.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "jsondiff",
            "name": "jsondiff",
            "runtimePlatform": "Python 3",
            "version": "^2.0.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "jsonschema",
            "name": "jsonschema",
            "runtimePlatform": "Python 3",
            "version": "^4.17.3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "oaiharvest",
            "name": "oaiharvest",
            "runtimePlatform": "Python 3",
            "version": "^3.0.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pandas",
            "name": "pandas",
            "runtimePlatform": "Python 3",
            "version": "^1.4.1"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pathos",
            "name": "pathos",
            "runtimePlatform": "Python 3",
            "version": "^0.3.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "progressbar2",
            "name": "progressbar2",
            "runtimePlatform": "Python 3",
            "version": "^4.2.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pydantic",
            "name": "pydantic",
            "runtimePlatform": "Python 3",
            "version": "^2.3.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pyld",
            "name": "pyld",
            "runtimePlatform": "Python 3",
            "version": "^2.0.3"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pylint",
            "name": "pylint",
            "runtimePlatform": "Python 3",
            "version": "^2.17.5"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "pyoai",
            "name": "pyoai",
            "runtimePlatform": "Python 3",
            "version": "2.5.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "python",
            "name": "python",
            "runtimePlatform": "Python 3",
            "version": "^3.9"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "python-crontab",
            "name": "python-crontab",
            "runtimePlatform": "Python 3",
            "version": "^3.0.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "python-dateutil",
            "name": "python-dateutil",
            "runtimePlatform": "Python 3",
            "version": "^2.8.2"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "rdflib",
            "name": "rdflib",
            "runtimePlatform": "Python 3",
            "version": "^6.2.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "requests",
            "name": "requests",
            "runtimePlatform": "Python 3",
            "version": "^2.28.1"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "rich",
            "name": "rich",
            "runtimePlatform": "Python 3",
            "version": "^12.6.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "shapely",
            "name": "shapely",
            "runtimePlatform": "Python 3",
            "version": "^2.0.1"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "typer",
            "name": "typer",
            "runtimePlatform": "Python 3",
            "version": "^0.9.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "wrapt",
            "name": "wrapt",
            "runtimePlatform": "Python 3",
            "version": "^1.15.0"
        },
        {
            "@type": "SoftwareApplication",
            "identifier": "xmljson",
            "name": "xmljson",
            "runtimePlatform": "Python 3",
            "version": "^0.2.1"
        }
    ],
    "targetProduct": {
        "@type": "CommandLineApplication",
        "executableName": "hmc-unhide",
        "name": "hmc-unhide",
        "runtimePlatform": "Python 3"
    },
    "url": "https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_harvesting",
    "version": "1.1.0"
}
broeder-j commented 10 months ago

If I understand the somesy code correctly.

I think this issue arises mainly because the original codemeta.json file is not used/parsed as an additional input to codemetapy, when the new codemeta.json is generated. So there is only a new codemeta file generated from the given sources and any additional information in the existing codemeta.json is ignored.

When somesy creates the new codemeta.json via codemetapy, one do not want conflicting information in the sources. Therefore, one needs to think about how to solve this, without conflicts. Maybe merge the original codemeta in afterwards to get all keys which somesy did not generate from the source. i.e could be a simple dict update of the original one with the new one (not optimal, but ok).

This merge will not solve the affiliation id issues, or any other nested things. Maybe one should do a rekursive @id merge. so a special treatment just for @ids. or even a rekursive dict merge.

Also there should be a test with a rich codemeta, where 1. nothing changes, and 2. something changes, and in both cases only the given changes should happen, like a version update or any other update.

apirogov commented 10 months ago

All somesy does for codemeta is basically collecting relevant sources and input files, calling codemetapy and saving the result if it changed. Somesy will not do anything more fancy than that.

What we could do is introduce an extra flag for a "supplemental" codemeta file to provide additional triples to be merged, which also would be passed to codemetapy. It would be the responsibility of codemetapy to overwrite or merge that extra information correctly.

We could also think about adding metadata fields to somesy that are passed to codemetapy using its interface (such as the base ID or certain specific codemeta fields not currently covered/inferred)

apirogov commented 10 months ago

I removed the 'bug' tag, because overwriting codemeta.json is expected and documented behavior.

broeder-j commented 9 months ago

I must say, I do not agree.

Every file is 'overwritten' by somesy, including pyproject.toml and CITATION.cff, but as a user I do not expect my specifications to be overwritten (like ignoring all other tool sections or metadata keys in pyproject.toml). i.e somesy's overwrite of codemeta.json in a completely different worse way, is unwanted and unexpected behavior by any user. The behavior of somesy is therefore inconsistent for the different files. And this is as I said above, in my view critical and until this is fixed I will not use somesy. So yes, it is current 'expecedt' behavior, but completely wrong. I would have put the label critical bug if existent.

apirogov commented 9 months ago

Because CodeMeta is based on RDF / linked data, for technical reasons it is practically impossible to provide an equivalent behavior like we can for TOML, YAML and XML files (i.e. more or less conservative patching).

Concerning codemeta specifically, you could disable codemeta synchronization (--no-sync-codemeta) if you dislike the behavior.

However, due to other idiosyncracies of your project setup (e.g. mixing setuptools and poetry in one pyproject.toml), it looks like your use case and requirements are outside of the target group and scope of somesy.

The main target audience for somesy is people who:

I would claim that:

Therefore, unfortunately, I think we have to "agree to disagree" here.

broeder-j commented 9 months ago

How about doing an rdf patch instead? i.e.

  1. Run through what somesy does now, if no codemeta.json file exists.

  2. Otherwise, if a codemeta.json exists generate the new triples, from the somesy.toml. Read in the existing codemeta.json, patch the graph then serialize it back, if the triples from the somesy.toml changed the graph. One would need to delete the triples that somesy can patch first from the graph. Would that not be a way? (whole Ids changes could be a problem, but one could apply them to some extent earlier.) Besides that it is technical hard. This type of issue is all over the place when dealing with linked data.

The current support is ONLY good if there is no codemeta.json in the first place. People who currently use somesy care about metadata, so there is likely a codemeta.json, because they like start from here: https://codemeta.github.io/codemeta-generator/

It is good that there is already a feature to disable the codemeta-sync. That makes it useable for me in the other context. Thanks!

An other solution could be:

Where all the additional information is also in the somesy,toml, i.e where somesy.toml can be as rich as codemeta.json. Then one could go with the full recreation like no and not change any logic expect piping the information through to codemetapy. i.e allow to provide any key codemeta also allows in the richness that codemeta.json allows.