Remove "Repository" column in the register table HTML and replace it with the title of the paper

nuest commented 3 months ago

The repository information is not helpful for browsing, but the title is. The title is be a link to the paper DOI.

We can keep the CSV file consistent with the table.

angelina-momin commented 2 months ago

@nuest just to clarify, are these the changes that you would like:

In the register.csv you would like the "Repository" column to be replaced by the paper titles
In the index.html you would like the paper titles to be hyperlinks to the actual paper
Do you have a preferred column name for the paper titles? I was thinking "Paper title"

At the moment the code is structured such that the codecheck yml files are retrieved using the "Repository" column, as such removing the Repository column would mean I redo that functionality to take in another argument.

Alternative I was thinking was to keep a separate register_with_repo.csv file containing the Repository column and not the Paper title. This file will be identical to the current register.csv and will be used to retrieve the codecheck yml. We can have a separateregister.csv file which will not have the Repository column but will have the Title column. Whenever there is a new codecheck entry it will go into the register_with_repo.csv file, while the register.csv file will be rendered by the codecheck package so no changes will need to be made there.

Let me know what you think of this or have any other alternative/ better file names.

nuest commented 1 month ago

@angelina-momin Very good that you ask to clarify this.

Re 1.: I do not intend to have this change in register.csv, because we need the information of the repository in there since that is the location of the codecheck.yml. I was only thinking about the user facing information. Ideally, the JSON would include more details. I think to make the register more user friendly to browser, we need more and more seperation between the HTML version and the other more "API-like" renderings.

Re. 2.: Yes, linking the paper via the DOI from the title, please.

Re. 3.: About he title... "Paper" seems generic enough to also cover preprints, so that's good. "Codechecked work" seems a bit clunky for everybody except me who uses codecheck as a verb weekly... Let's go with "Paper title" for now!

I would imagine to keep the register.csv as the core data set, which is the one we edit when we add a new check, and then derive all other documents from it. I know we create a discrepancy between the HTML rendering and the CSV file that is linked below it, though. I can live with that for now, and then solve this via #102 which will generate a more useful CSV file, which we can then link to from below the HTML table.

angelina-momin commented 1 month ago

Thanks for the clarification @nuest.

Got a few other questions:

I noticed that for Larisch-reproduction the codecheck.yml paper reference contains additional text besides the paper link. link to the file. This creates issues with the hyperlink.

Do you want me to:

Change this codecheck.yml's paper reference to only contain the link?
Code a function such that given any paper reference (with or without extra text) it extracts the paper link and uses it as the hyperlink.
Both (1) and (2)

I understand that you would like to keep the csv's as they are and they will not match with the register table. Any preference what you would like for the JSON and Markdown file?

I am assuming the markdown should be identical to the register html The JSON file at the moment (prior to any changes in the repo) already contains Repo link, Title and paper reference. Should I keep it as such?

nuest commented 1 month ago

Please do both, (3).

The reference should be free text, for example when a paper is not yet published but we decide to publish the codecheck. It is not the norm, bit the processing should not break. The validation of codecheck.yml files in the R package should already reflect this.

Yes, this means some things, like linking to the paper from the HTML, does not work.

Trying to extract the right link from the field is a rabbit hole I do not want us to go down to. Then I'd rather add an extra field reference-text and require that reference always is a DOI or so.

After hardening the code, the particular field should only contain the correct DOI that is shown in the linked page.

Re. markdown content: use the option that is least work (probably matching the HTML?)

Re. JSON content: should be as extensive as possible to support programmatic access to our metadata. So probably leave as is for now, unless changes are quick. Then we can extend it in connection with #102.

codecheckers / register

Remove "Repository" column in the register table HTML and replace it with the title of the paper #88