datadryad / dryad-product-roadmap

Repository of issues for Dryad project boards
https://github.com/orgs/datadryad/projects
8 stars 0 forks source link

Solution needed: Last Modified date #2008

Closed jleighherzog closed 1 year ago

jleighherzog commented 2 years ago

Ideally, curators claim and curate datasets in the order they were submitted (oldest to newest). To identify the oldest submission, curators perform a search on the Dryad Admin site for all unassigned datasets in "Submitted" status. Then, they sort results by "Date Last Modified" (A to Z; oldest to newest).

The issue is: When there is any action taken, whether it's a manual or automatic activity, the 'Last Modified Date' (LMD) is updated to reflect that activity. For example, when a curator leaves a note in the Activity Log or when an auto-notification is received from the journal, the LMD is updated. Or, more commonly, the dataset is 'Submitted' but the author continues to log in and check the status, which will update the LMD, essentially resetting the position of their dataset in the queue.

From a curator: "Beyond the usual cases of authors who keep going in and resubmitting their unchanged dataset in the hopes that their dataset gets curated faster, I've seen a few Helpdesk cases in the last week or so where the author submitted it just once and then an automated update from the integrated journal resubmitted it, setting them back in the queue by a few days."

Some ideas:

1) Add a new column for "SubmittedDate" be included on the Admin search screen? This would be the date of the first/initial "Submitted" status. Open to other suggestions.

Note: Versioned datasets are also received in "Submitted" status and auto-assigned to a curator so that should remain unchanged. And, PPR submissions shouldn't be impacted because they only reach "Submitted" status for the first time when they are released from PPR, correct?

2) From Bryan:

I was mulling over our specific issue of authors who keep resubmitting their datasets and end up losing their position in the queue in the broader context of my HD experience. What if we added categories for the dataset designation following the five standard ones that researchers use on their CVs for manuscripts (in prep, submitted, in revision, in press, published) plus the one for 'other or not applicable?'

The way I'm fuzzily envisioning this right now, it would benefit us in a couple of ways: -- By essentially automatically designating priority datasets, we could hopefully cut down on the number of HD status checks from irritated authors whose dataset is buried in the middle of many datasets that have no rush on them.

-- If there was a way to automatically PPR datasets when the associated MS is designated as 'in prep' or 'submitted' (i.e. nowhere close to seeing the light of day), we could cut down the number of people whose datasets are published before the MS is accepted and who frantically try to unpublish it. Authors would have to change the MS designation to remove it, but they have to do this anyway when they PPR it in our current system.

-- We could time it with our hypothetical blog series on curation processes in one titled something like "when is the best time to submit my Dryad dataset" or something like that as part of a bigger attempt to get more info about our process out there

-- This might be a quicker/easier fix than getting that new column added.

3) Create versions for internal (Dryad) and external users (journals, institutions, etc.)

sfisher commented 1 year ago

We also have created_at dates for most versions which doesn't change on updates and is when they first started the dataset version. The updated_at date gets updated when things change (such as users changing things which is probably the one you're talking about). We also have a submission object created_at date for when a submission happens. It kind of sounds like you want a first submission date to sort by (first_submission_after_last_curation_date or something like that).

We could probably derive that date, but it's not super easy to put in the query on the curation page since it's complicated to calculate and requires going through versions and statuses to calculate and examining all the history. Though we could pre-calculate the date and save it to our database as another date to make the query fast/possible (which we could do).

jleighherzog commented 1 year ago

After discussing with the team, we are most interested in incorporating the date the first version was submitted. This will allow us to sort the queue to ensure we address the oldest submissions, first (i.e., first_submission_date). It probably shouldn't include "_after_last_curation_date" because these new submissions will not have been curated yet. Would this be easier to implement because it wouldn't require a calculation but rather a point in time (the first instance a dataset appeared in 'Submitted' status)?