cff file - Githubissues

hvwaldow commented 1 month ago

Base desired citation on proper cff file.
cff file generated from contributors.yml through GitHub-Action.

MakisH commented 1 week ago

@hvwaldow and I met and created a first attempt to a CFF:

cff-version: 1.2.0
message: If you use this software, please cite it using these metadata.
# What do we put as a title for the repository itself, if this is a CFF for the repository itself?
title: Foundational Competencies and Responsibilities of a Research Software Engineer
abstract: "The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum, RSE roles may look similar to a traditional research role. At the other extreme, they resemble that of a software engineer in industry. Most RSE roles inhabit the space between these two extremes. Therefore, providing a straightforward, comprehensive definition of what an RSE does and what experience, skills and competencies are required to become one is challenging. In this community paper we define the broad notion of what an RSE is, explore the different types of work they undertake, and define a list of foundational competencies as well as values that outline the general profile of an RSE. Further research and training can build upon this foundation of skills and focus on various aspects in greater detail. We expect that graduates and practitioners will have a larger and more diverse set of skills than outlined here. On this basis, we elaborate on the progression of these skills along different dimensions, looking at specific types of RSE roles, proposing recommendations for organisations, and giving examples of future specialisations. An appendix details how existing curricula fit into this framework."
authors:
  - family-names: Seibold
    given-names: Heidi
    orcid: "https://orcid.org/0000-0002-8960-9642"
    affiliation: "IGDORE Munich"
    country: DE
    email: heidi@seibold.co
  - ...
identifiers:
  - description: competencies_arxiv_v1
    type: url
    value: "https://github.com/CaptainSifff/paper_teaching-learning-RSE/releases/tag/competencies_arxiv_v1"
  - description: competencies_arxiv_v2
    type: url
    value: "https://github.com/CaptainSifff/paper_teaching-learning-RSE/releases/tag/competencies_arxiv_v2"
license: CC-BY-4.0
repository-code: "https://github.com/CaptainSifff/paper_teaching-learning-RSE"

Some links:

schema
basic example
Check with cffconvert --validate (without filename)

We probably still want to add:

keywords
contact (corresponding author?)
preferred-citation ("Goth et al.")

We have intentionally left out:

license-url (standard)
type (we are neither a software, nor a dataset)
date-released and date-published (not sure what to put there)
version (we don't really have a versioning scheme at the moment)

mhagdorn commented 1 week ago

The cff file gets generated by github action. When does the github action run, on each commit? If so the version could just be a commit hash. do we need a special release branch that populates the date-released field? that could then also have a sensible version string. could we use a release branch to autogenerate the stuff that gets uploaded to arxiv? The keywords field we can populate from the list of keywords in the competencies.md file.

mhagdorn commented 1 week ago

It just occurred to me that the cff file is for the entire repo, isn't it? I think it would be neater to split this repo and take out the other skeleton papers and put them into their own repo. That way we get a 1-1 correspondence between the cff file, the repo and the paper.

MakisH commented 1 week ago

The cff file gets generated by github action.

Not yet. We are first figuring out the manual mode. Scripted mode comes later, CI comes last.

All the rest: good questions and suggestions!

We would generally prefer to get the names, affiliations, etc from the contributors.yml (or, I would say, generate the contributors.yml/the contributors list from the CFF).

It just occurred to me that the cff file is for the entire repo, isn't it? I think it would be neater to split this repo and take out the other skeleton papers and put them into their own repo. That way we get a 1-1 correspondence between the cff file, the repo and the paper.

This is also one of the biggest issues right now, which we cannot solve bilaterally. I also think that having one repository for one paper would make more sense. This modularization could also be reflected in the "who is involved in which meeting", as I think someone already suggested.

mhagdorn commented 1 week ago

I seem to remember that we have already decided that we would split up the repo and have the individual repos under our organisation. it just hasn't happened yet. maybe something to discuss at the next meeting (pinging @CaptainSifff)

hvwaldow commented 5 days ago

Thanks to @MakisH for getting me to start working on this, but there is a bit more to do if we want to continue with the plan that I had noted down:

A cff-file that is specifically for citing the repo, not a specific pre-print paper.
An auto-update of the cff from the contributors.yaml
Citation with Florian first, then alphabetically
Actual citation-text includes "et al."

I'm not sure how that gels with the idea of breaking up the mono-repo into different ones. For me it feels that this would complicate things. The original idea of having people "cite" the GitHub-repo for stuff that isn't in a pre-print (yet) and have people cite directly the pre-print (or properly published version) for papers that are published on Arxiv or a journal still makes sense.

Until further notice I'll stick with the original plan. In case the organization of the whole thing changes, that work can surely be re-used.

MakisH commented 4 days ago

We briefly discussed the issue today. Some notes:

We agreed we need the CFF as a single source of truth, which should be about the repository
We agreed that automating would be fun and something we should do, including maybe generating the contributors.yml from the CFF. Is there maybe even a LaTeX package to draw such information from a CFF? (@sdruskat might know)
We still want to move the repository into https://github.com/the-teachingRSE-project
We still want to split the repository for the rest of the papers

We were still a bit puzzled about the only options of software or dataset for the (optional) type field, in which case we were wondering if it would make sense to add more types into the standard (cc @sdruskat).

@hvwaldow feel free to ping me if you would like to have another synchronous pair-programming session. I completely understand why these work, and I learn something through the process.

jngrad commented 2 days ago

We agreed that automating would be fun and something we should do, including maybe generating the contributors.yml from the CFF.

That would be ideal. Furthermore, if the contents of contributors.yml could be written to citation.cff, our Python parser would still work, with minor adjustments. As I understand it, the CFF file and our contributors.yml cannot be reconciled, because the latter contains affiliation information and funding acknowledgment information that are not part of the official CFF schema. The CFF validator must reject fields that are not in the schema, since the schema relies on which fields are provided in an author to determine whether it is an instance of person or entity.

Is there maybe even a LaTeX package to draw such information from a CFF?

Each LaTeX document class provides custom methods to add authors and their affiliation. This is something our Python parser in filter.py takes care of.

We still want to split the repository for the rest of the papers

That would solve the issue. The Python parser was introduced to guarantee the affiliation and funding acknowledgment information would be consistent in all manuscripts of this repository. This was motivated by the fact that a lot of files were created during the big split, and some files were later merged. We actually had one instance where an author changed his affiliation, and he only had to edit 1 file, instead of 5 files.

If we split the repository, the Python parser will become obsolete, since we cannot enforce consistency across repositories. Authors will write down their affiliation and funding acknowledgments in the Markdown files directly, and will take full responsibility for keeping that information up-to-date if it changes.

MakisH commented 2 days ago

If we split the repository, the Python parser will become obsolete, since we cannot enforce consistency across repositories.

But I think that we should not see these repositories as eternally living documents. The affiliation of each author in each repository should be the affiliation they had at the time of contributing and/or publishing the final version of the paper.

I generally wouldn't worry too much about keeping consistency across repositories. There is a historical development, and each repository will have slightly different needs and tools available, depending on the time it is developed, and the publication venue.

jngrad commented 1 day ago

If we split the repository, the Python parser will become obsolete, since we cannot enforce consistency across repositories.

But I think that we should not see these repositories as eternally living documents. The affiliation of each author in each repository should be the affiliation they had at the time of contributing and/or publishing the final version of the paper.

Yes. Once published, the information cannot change anymore. That's going to be a problem for this repository, because one manuscript is now ready for submission, but the others aren't. We probably don't want the Python parser to maintain multiple contributors.yml files to handle this. With one manuscript per repository, we don't have this issue anymore.

The Python parser was introduced to solve a problem that was specific to our repository: manuscripts were split and fused together at a rapid pace, with information ending up being lost or duplicated (and would then diverge over time). This was an issue with funding acknowledgments and author affiliations, because only the affected authors could notice and fix discrepancies, whereas discrepancies in the manuscript text could be fixed by multiple people by looking at the pads.

CaptainSifff / paper_teaching-learning-RSE

cff file #253