RichardLitt / dependency-cite

Cite all of your software dependencies you use in your research
MIT License
4 stars 0 forks source link

Use citation.cff files #2

Open RichardLitt opened 11 months ago

RichardLitt commented 11 months ago

Right now, I don't believe the script is using citation.cff files at all, although it does check for them. It ought to do something with them.

andrew commented 11 months ago

I'm currently searching for files in the root of the repos called CITATION.* and other similar formats (full list here) (example), there's a python library that can convert and read them here: https://github.com/citation-file-format/cffconvert

andrew commented 11 months ago

Each repository I've scanned has a list of interesting files in the metadata field, including citation, example: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sandialabs%2FpvOps

{
  "uuid": "289032705",
  "full_name": "sandialabs/pvOps",
  "owner": "sandialabs",
  "description": "A set of documented functions for supporting operations research of photovoltaic energy systems. ",
  "archived": false,
  "fork": false,
  "pushed_at": "2023-10-05T21:45:42.000Z",
  "size": 37105,
  "stargazers_count": 11,
  "open_issues_count": 21,
  "forks_count": 9,
  "subscribers_count": 3,
  "default_branch": "master",
  "last_synced_at": "2023-10-06T16:27:47.550Z",
  "etag": null,
  "topics": [],
  "latest_commit_sha": null,
  "homepage": "https://pvops.readthedocs.io/en/latest/",
  "language": "Jupyter Notebook",
  "has_issues": true,
  "has_wiki": null,
  "has_pages": null,
  "mirror_url": null,
  "source_name": null,
  "license": "other",
  "status": null,
  "scm": "git",
  "pull_requests_enabled": true,
  "icon_url": "https://github.com/sandialabs.png",
  "metadata": {
    "files": {
      "readme": "README.md",
      "changelog": null,
      "contributing": null,
      "funding": null,
      "license": "LICENSE",
      "code_of_conduct": "CODE_OF_CONDUCT.md",
      "threat_model": null,
      "audit": null,
      "citation": "citation.CIF",
      "codeowners": null,
      "security": null,
      "support": null,
      "governance": null
    }
  },
  "created_at": "2020-08-20T14:48:48.000Z",
  "updated_at": "2023-10-05T21:03:28.000Z",
  "dependencies_parsed_at": "2023-09-28T23:56:15.789Z",
  "dependency_job_id": null,
  "html_url": "https://github.com/sandialabs/pvOps",
  "commit_stats": {
    "total_commits": 388,
    "total_committers": 11,
    "mean_commits": 35.27272727272727,
    "dds": 0.6469072164948453,
    "last_synced_commit": "fe6e6579239ab161908fc5b0d3819b720c161f3c"
  },
  "previous_names": [],
  "tags_count": 11,
  "repository_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sandialabs%2FpvOps",
  "tags_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sandialabs%2FpvOps/tags",
  "releases_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sandialabs%2FpvOps/releases",
  "manifests_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sandialabs%2FpvOps/manifests",
  "owner_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sandialabs",
  "download_url": "https://codeload.github.com/sandialabs/pvOps/tar.gz/refs/heads/master",
  "host": {
    "name": "GitHub",
    "url": "https://github.com",
    "kind": "github",
    "repositories_count": 163723659,
    "owners_count": 8650178,
    "icon_url": "https://github.com/github.png",
    "host_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub",
    "repositories_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories",
    "repository_names_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names",
    "owners_url": "https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"
  }
}
RichardLitt commented 11 months ago

This is great. When you say "I am currently searching", can you elaborate on what you mean?

andrew commented 11 months ago

I guess I mean "detecting" rather than "searching" mostly

For each repository that is discovered and analysed (currently at 170 million), I look for certain kinds of files that have specific means in open source software, like the readme, license, changelog, code of conduct files.

So that means that every package in the packages service that references a repository, should have the name/path of the citation file present in it's metadata.