PubUpdater bug! Number of dataset matches per page does not update or expand

mahirst commented 4 weeks ago

I'm missing a lot of the terminology to be able to explain this correctly, but essentially, when working through PubUpdater in the last couple of weeks I noticed as I was getting the list of matches down to 1-2 matches on a page (e.g., page 1 only has two matches), it does not decrease the total number of pages by populating the newer pages with the backlog of matches.

Steps to reproduce: Go to https://datadryad.org/stash/publication_updater?direction=desc&match_type=articles&page=1&page_size=100&show_all=false&sort=score

Mess around with the page size button at the bottom. Notice how there are still only 2 entries on the first page!

Expected behavior: I think the expectation would be that there would always be a full 10 entries on the first page (when 10 per page is selected), because of our huge backlog. It would be similar to the way that there is always a full page of entries on the curation queue, and as we work through the queue and there are fewer datasets, then there are also fewer pages of entries.

ahamelers commented 3 weeks ago

With the changes, this seems to be working (as far as I'm able to tell) on sandbox, but is still very weird on production. It's difficult to test since only prod has the datasets and pub updater proposed changes, and they're not easy to generate in large numbers

ahamelers commented 3 weeks ago

Okay, I have a new theory about this. There is apparently an additional check, on the view instead of the database query, of whether the "proposed change" is actually already part of the submission metadata:

<%
  existing_pubname = fetch_identifier_metadata(resource: resource, data_type: 'publicationName')
  existing_pubissn = fetch_identifier_metadata(resource: resource, data_type: 'publicationISSN')
  existing_pubdoi = fetch_related_primary_article(resource: resource)
%>

<% unless existing_pubname == proposed_change.publication_name &&
            (existing_pubissn == proposed_change.publication_issn || proposed_change.publication_issn.blank?) &&
            existing_pubdoi == proposed_change.publication_doi
%>

  <tr class="c-lined-table__row" id="current_<%= proposed_change.id %>">
    <%= render partial: 'current_metadata', locals: { resource: resource, existing_pubdoi: existing_pubdoi, existing_pubissn: existing_pubissn, existing_pubname: existing_pubname } %>
  </tr>
  <tr class="c-lined-table__row" id="proposed_<%= proposed_change.id %>">
    <%= render partial: 'proposed_metadata', locals: { proposed_change: proposed_change, existing_pubdoi: existing_pubdoi, existing_pubissn: existing_pubissn, existing_pubname: existing_pubname, resource: resource } %>
  </tr>
<% end %>

So this will count things as part of the total and part of the page count, and then just not show them!

I'm guessing prod has a lot of "proposed changes" where the metadata actually already exists, and if this check were removed things would be a lot less weird in the UI. This should be moved to the actual db query so it's checked before counts, etc. are done.

@ryscher Should we also consider marking these proposed changes in the DB in some way? Or deleting them?

datadryad / dryad-product-roadmap

PubUpdater bug! Number of dataset matches per page does not update or expand #3395