dockstore / dockstore

Our VM/Docker sharing infrastructure and management component
https://dockstore.org/
Apache License 2.0
116 stars 27 forks source link

Autocreate Zenodo DOIs and create shared access links #5879

Closed kathy-t closed 3 weeks ago

kathy-t commented 1 month ago

Description

Updated PR Description Zenodo swagger PR: https://github.com/dockstore/swagger-java-zenodo-client/pull/21 This PR has been updated with the feedback from the [story time discussion](https://docs.google.com/document/d/1SC1B34edGyqYYSYr5cIuHN3uiBZYIxDXhEC9XYFQrtE/edit?usp=sharing). This PR requires the creation of a Dockstore-owned personal access token on Zenodo as well as a Dockstore community on Zenodo. This personal access token is used to automatically create DOIs. **Automatic DOI Creation Behaviour** - DOIs are automatically created for valid published tags for `Workflow`-based entries on refresh, GitHub App release, and on publish. During automatic DOI creation, we skip creating a snapshot. - The DOI is only created the first time it is invoked. If tags are recreated, we do not create a new DOI. This can be an enhancement if we want to go down that route. - I originally didn't do it on publish but after manual testing, the user experience to automatically create DOIs for a new workflows that aren't originally published felt clunky. For example, a user creates a new GitHub app workflow that isn't published so no DOIs are created. They publish it. They must now generate push events for DOIs to be automatically created. - There is no option to opt-out. - A new `Doi` entity is added so that we can track multiple DOIs from multiple sources (user, GitHub, Dockstore). This means that users can have both user-generated and Dockstore-generated DOIs, along with GitHub DOIs. - I have deprecated the old DOI columns instead of removing them so that there is no down time during the release. There is no trigger to keep the new and old columns in sync. Is it worth it to add a trigger? - Note that we don't allow the deletion of entries that were previously published. Since the auto DOI creation only occurs for published tags, this means that DOIs in our database won't be orphaned. - Tags with only Dockstore-generated DOIs can be deleted since they are not frozen...this can result in an orphaned DOI. **Dockstore Zenodo Community** - New DOIs, whether user or Dockstore generated, are added to the Dockstore Zenodo community. We do not add old records to the community retro-actively. **Shared Access Links** - Three endpoints are added to manage shared access links that allow users to edit the Dockstore-generated DOIs. The [access link](https://inveniordm.docs.cern.ch/reference/rest_api_drafts_records/#create-an-access-link) returns an ID and a token. When this token is appended to the Zenodo DOI URL, the user is able to edit the record if they are logged into Zenodo. I did not store this token in the database because I feared that we could accidentally expose it. Only the link ID is stored and when requested, we can retrieve the token from Zenodo. This editing feature will likely not be used often. - The link has edit permissions and no expiration date. There is an endpoint to delete the access link. - The access link ID is stored in the concept DOI object because this link is applicable to all versions of the DOI **Hoverfly Tests** - Created a few Zenodo tests using Hoverfly. These tests aren't super complex due to the limitations of mocked responses, but it's something :slightly_smiling_face: - Created a new testing profile for tests that use Hoverfly because it needs the localtruststore in CircleCI **Follow-up** - Hook up to the UI: https://ucsc-cgl.atlassian.net/browse/SEAB-6462
Draft PR Description Currently in draft mode to get feedback as I figure out some Hoverfly test weirdness in CircleCI. Zenodo swagger PR: https://github.com/dockstore/swagger-java-zenodo-client/pull/21 This PR adds a `enableAutomaticDoiCreation` boolean to `Entry` that is set to `true` by default. When set to true, it automatically creates DOIs using Dockstore's own Zenodo personal access token, specified in the web.yml. The auto-creation occurs during refresh and GitHub App release for valid, published tags and involves snapshotting the version prior to creating the DOI. This boolean can be updated using a new endpoint, allowing users to opt-out in the UI. For GitHub app entries, they will need to specify it in their .dockstore.yml. - Should this feature include hosted entries? This boolean can be updated using a new endpoint, allowing users to opt-out in the UI. For GitHub app entries, they will need to specify it in their .dockstore.yml. This PR also adds an endpoint to create a shared access link with edit permissions so entry owners can modify their DOIs if they want. There is an option to set an expiration date, but I set it to never expire. The idea is that in the UI, the entry owner can click a button that creates this shared access link. After the first time, the UI can display this link somewhere so the user can modify their DOI. UI PR will be in follow-up ticket. - Should we allow the user to delete this access link? This PR also adds DOIs to the Dockstore community, using the community ID specified in the web.yml. This is for all DOIs, whether Dockstore-created or end-user-created. **Some questions and thoughts** There is a potential exception if an end user has already requested a DOI for an entry version and our Dockstore user tries to create a DOI for a different version because our Dockstore user does not have permissions to modify some one else's DOI. Due to this, there is a check to see if the entry has 0 DOIs or if all DOIs are Dockstore-owned to ensure that no exception occurs. I think there's a bigger conversation here about how we want to handle this situation so this PR takes the simplest route for now, i.e., not allowing the mixing of end-user-created and Dockstore-created DOIs. Below are some thoughts about it, but I think we should have a follow-up ticket to decide what to do. - Do we solve this with communities? Using a one-time endpoint, we could try to add all DOIs to the Dockstore community. This would give the Dockstore Zenodo user permission to edit the DOI and create versions for DOIs that are owned by end users. - Do we allow multiple users to create DOIs and we store multiple DOIs? (similar ticket https://ucsc-cgl.atlassian.net/browse/DOCK-1922) - How about the reverse where the Dockstore Zenodo user creates the DOI first and the end user wants to create their own DOI? - Shared access links would allow them to create new versions of the DOI, but they would not "own" the DOI

Review Instructions Note that the UI portion of this feature is not implemented yet, so these review instructions will require looking at the API responses for the DOIs.

To find the relevant DOI fields for the review steps below:

On QA:

  1. First, verify that you can request a DOI for a workflow version the normal way, i.e. link your Zenodo account and request a DOI. Verify that the concept DOI and version DOI has the initiator USER (since it is user-generated)
  2. For the same workflow, unpublish and publish it. The act of publishing will automatically create DOIs for valid, published tags. Verify that there are now two concept DOIs and two version DOIs for the version in step 1. The new DOIs should have DOCKSTORE as the initiator.
  3. For the same workflow, if it is a manually registered workflow, create a new tag that is valid then refresh the workflow. If it is a GitHub App workflow, create a new tag and wait for the changes to be registered by Dockstore. Verify that the new tag has a DOI with the initiator as DOCKSTORE.
  4. For the same workflow, request an edit link for it using this endpoint. Confirm that the call was successful and that the response includes a token.
  5. Call this endpoint for the same workflow to get the existing DOI edit link. Confirm that the call was successful and that the response includes a token.
  6. Delete the edit link using this endpoint. Confirm that it was deleted by calling the getDOIEditLink endpoint (it should fail because there is no link for it).

Issue https://ucsc-cgl.atlassian.net/browse/SEAB-5929 https://ucsc-cgl.atlassian.net/browse/SEAB-6371

Security and Privacy

If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.

e.g. Does this change...

Please make sure that you've checked the following before submitting your pull request. Thanks!

denis-yuen commented 1 month ago

Due to this, there is a check to see if the entry has 0 DOIs or if all DOIs are Dockstore-owned to ensure that no exception occurs. I think there's a bigger conversation here about how we want to handle this situation so this PR takes the simplest route for now, i.e., not allowing the mixing of end-user-created and Dockstore-created DOIs.

I think this makes sense, basically allowing users to go from automatic to user-created but not vice versa

Do we allow multiple users to create DOIs and we store multiple DOIs?

A good observation, I guess we basically had this problem in some way already.

How about the reverse where the Dockstore Zenodo user creates the DOI first and the end user wants to create their own DOI?

I think we should allow this, there's a good chance the user may not know about the automatic doi feature until after it issues one, so we need to allow for this direction (automatic to end-user created)

svonworl commented 1 month ago

This PR also adds an endpoint to create a shared access link with edit permissions so entry owners can modify their DOIs if they want. There is an option to set an expiration date, but I set it to never expire. The idea is that in the UI, the entry owner can click a button that creates this shared access link. After the first time, the UI can display this link somewhere so the user can modify their DOI. UI PR will be in follow-up ticket.

  • Should we allow the user to delete this access link?

Whatever the editing mechanism, if it "leaks" (as the above link sounds like it could), it'd be great if there was a way to neutralize it.

svonworl commented 1 month ago

Regarding the concept of auto-snapshotting before generating the DOI, some thoughts:

Will automatically snapshotting all tags mess up anyone's release flow? Once the version is snapshotted, it's frozen. Our assumption is that a github tag is a permanent fixture, but does everyone use it that way? For example, imagine a repo where the stable tag is attached to the latest stable release and moves around. The first stable tag would be snapshotted, and then Dockstore would reject subsequent retaggings. Do people do stuff like that, and if so, should we support it?

If the answer to the last question above is yes, we might need to figure out what to do if a user accidentally/inadvertently freezes a version before they opt-out.

svonworl commented 1 month ago

Regarding multiple DOIs, we've been discussing at least three sources of DOIs:

If you count "generate our own" custom DOIs, four sources. :)

Would be great if we could track all of them. Time for a new "DOI" database entity?

coverbeck commented 1 month ago

Regarding the concept of auto-snapshotting before generating the DOI, some thoughts:

Will automatically snapshotting all tags mess up anyone's release flow? Once the version is snapshotted, it's frozen. Our assumption is that a github tag is a permanent fixture, but does everyone use it that way? For example, imagine a repo where the stable tag is attached to the latest stable release and moves around. The first stable tag would be snapshotted, and then Dockstore would reject subsequent retaggings. Do people do stuff like that, and if so, should we support it?

If the answer to the last question above is yes, we might need to figure out what to do if a user accidentally/inadvertently freezes a version before they opt-out.

Seconding this comment; didn't we find a Broad repo where they had "moving" tags, i.e., they delete their tags, then recreate them at new spots in the branches?

More generally, it makes me think, I wonder if we shouldn't relax the snapshot requirement and make it optional.

coverbeck commented 1 month ago

Regarding multiple DOIs, we've been discussing at least three sources of DOIs:

* github-generated

* user-generated Zenodo

* dockstore-generated Zenodo

If you count "generate our own" custom DOIs, four sources. :)

Would be great if we could track all of them. Time for a new "DOI" database entity?

I think we should support multiple DOIs as well. It would be easy to have both GitHub and Dockstore integration turned on.

coverbeck commented 1 month ago

Regarding the concept of auto-snapshotting before generating the DOI, some thoughts: Will automatically snapshotting all tags mess up anyone's release flow? Once the version is snapshotted, it's frozen. Our assumption is that a github tag is a permanent fixture, but does everyone use it that way? For example, imagine a repo where the stable tag is attached to the latest stable release and moves around. The first stable tag would be snapshotted, and then Dockstore would reject subsequent retaggings. Do people do stuff like that, and if so, should we support it? If the answer to the last question above is yes, we might need to figure out what to do if a user accidentally/inadvertently freezes a version before they opt-out.

Seconding this comment; didn't we find a Broad repo where they had "moving" tags, i.e., they delete their tags, then recreate them at new spots in the branches?

More generally, it makes me think, I wonder if we shouldn't relax the snapshot requirement and make it optional.

There's variant of users just deleting GitHub releases/tags. If we snapshot the tag after it's created, the tag will be around forever in Dockstore. Which sometimes you might want to be fair, but other times not.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 78.53107% with 76 lines in your changes are missing coverage. Please review.

Project coverage is 74.54%. Comparing base (2f81db3) to head (f5f000f).

Files Patch % Lines
.../io/dockstore/webservice/helpers/ZenodoHelper.java 74.09% 49 Missing and 8 partials :warning:
...webservice/resources/AbstractWorkflowResource.java 77.77% 6 Missing and 2 partials :warning:
...rc/main/java/io/dockstore/webservice/core/Doi.java 80.00% 7 Missing :warning:
...rc/main/java/io/dockstore/common/HoverflyTest.java 0.00% 1 Missing :warning:
...ore/webservice/DockstoreWebserviceApplication.java 50.00% 1 Missing :warning:
.../dockstore/webservice/resources/EntryResource.java 66.66% 0 Missing and 1 partial :warning:
...ckstore/webservice/resources/WorkflowResource.java 95.83% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #5879 +/- ## ============================================= + Coverage 73.73% 74.54% +0.81% - Complexity 5260 5353 +93 ============================================= Files 371 374 +3 Lines 19207 19404 +197 Branches 2012 2021 +9 ============================================= + Hits 14162 14465 +303 + Misses 4090 3970 -120 - Partials 955 969 +14 ``` | [Flag](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | Coverage Δ | | |---|---|---| | [bitbuckettests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `27.04% <13.55%> (-0.09%)` | :arrow_down: | | [hoverflytests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `27.47% <75.42%> (?)` | | | [integrationtests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `56.98% <24.01%> (-1.39%)` | :arrow_down: | | [languageparsingtests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `11.09% <8.47%> (+0.01%)` | :arrow_up: | | [localstacktests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `21.63% <11.86%> (-0.05%)` | :arrow_down: | | [toolintegrationtests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `30.38% <14.68%> (-0.10%)` | :arrow_down: | | [unit-tests_and_non-confidential-tests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `26.04% <16.10%> (-2.42%)` | :arrow_down: | | [workflowintegrationtests](https://app.codecov.io/gh/dockstore/dockstore/pull/5879/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore) | `38.40% <18.92%> (-0.27%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dockstore#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

denis-yuen commented 1 month ago

More generally, it makes me think, I wonder if we shouldn't relax the snapshot requirement and make it optional.

I don't think we should be opinion-less on this one though.

From a bioinformatics perspective, I don't think we should be encouraging this behaviour since it is contrary to reproducibility. You could get completely different results from two runs of the same version of the same workflow if they have different code. From a software engineering perspective, it violates the principle of least surprise and officially insane, according to the git documentation anyway.

I think the compromise here is that if a user is doing DOI issuing themselves, we don't actually automatically invoke a snapshot. We only invoke a snapshot as a pre-requisite to issuing a DOI ourselves.

Edit to add: If we're talking about the same repo, I think they were re-tagging tags that look like "develop" or "main" rather than "v1" or "v2" which is marginally less insane than what the git documentation is describing.

denis-yuen commented 1 month ago

There's variant of users just deleting GitHub releases/tags. If we snapshot the tag after it's created, the tag will be around forever in Dockstore. Which sometimes you might want to be fair, but other times not.

Note that while the tag may be deleted in github, the DOI copy should still be around in Zenodo. I think we kinda do the same thing by only allowing them to archive rather than delete.

kathy-t commented 1 month ago

More generally, it makes me think, I wonder if we shouldn't relax the snapshot requirement and make it optional.

I think the compromise here is that if a user is doing DOI issuing themselves, we don't actually automatically invoke a snapshot. We only invoke a snapshot as a pre-requisite to issuing a DOI ourselves.

Perhaps the auto DOI creation should be opt-in instead of opt-out to reduce the chances of freezing a version before a user notices and opts out? This would also match Zenodo's pattern with their GitHub integration

kathy-t commented 1 month ago

Some test failures related to the deprecation of the old DOI columns, will fix but in the mean time, re-requesting reviews

denis-yuen commented 1 month ago
  • Is it worth it to add a trigger?

Does not seem like it, we can schedule a 1.16.1 ticket for example to delete the old columns

sonarcloud[bot] commented 3 weeks ago

Quality Gate Failed Quality Gate failed

Failed conditions
79.4% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud