DataONEorg / metrics-service

An efficient database and REST API for delivering aggregated data set metrics to clients.
Apache License 2.0
2 stars 1 forks source link

Metrics previewed on search page are different from landing page #83

Closed mburrus closed 2 years ago

mburrus commented 3 years ago

Issue is cross listed with ESS-DIVE: ess-dive/ess-dive-catalog#466

Describe the bug

  1. Data packages metrics between the search page and details pages are incongruous (not all the time). When searching through data packages on https://data.ess-dive.lbl.gov/ and https://search.dataone.org/data, the viewed/downloaded/cited count icons that are previewed in the search menu differ from the counts that are reported on a data package landing page (see screenshots). The behavior of this bug is inconsistent. Some do not have any preview at all (see below), some count previews are present but are off by some order of magnitude, and this bug is present on DataONE, for some data packages but not all. It has also been observed on https://arcticdata.io/catalog/data but that isn't captured here

  2. The metrics for 1/2021 through 2/2021 seem off. The graphs which display metric counts over time do not always display the most current month. Some graphs stop in October 2020, December 2020, or they seem up to date (Feb 2021, as of writing this ticket). It's unclear what this means.

See this Google Slide for side-by-side comparison of some of these behaviors from 1 and 2.

Screenshots

A preview of a data package from ESS-DIVE data search

Screen Shot 2021-03-03 at 6 43 52 PM

The landing page of the same data package on ESS-DIVE

Screen Shot 2021-03-03 at 6 43 37 PM
csjx commented 3 years ago

Heya @gothub - I had asked Madison to add this ticket in the metrics-service repository because I thought it might be a metrics querying issue, but can you verify that this isn't a MetacatUI issue in the ESS-DIVE search results table view first? Thanks!

gothub commented 3 years ago

Any differences between metrics viewed on ESS-DIVE and DataONE search may be related to what appears to be sync/indexing issues on the CN. For the DOI shown in the example, here are the search results on the CN:

https://cn.dataone.org/cn/v2/query/solr/?q=seriesId:%22doi:10.15485/1603775%22&q.op=AND&fl=id,seriesId,dateModified,obsoletes,obsoletedBy&sort=dateModified%20desc
<result name="response" numFound="4" start="0">
<doc>
<str name="id">ess-dive-d3dc26585e68115-20201021T143135536</str>
<str name="seriesId">doi:10.15485/1603775</str>
<str name="obsoletes">ess-dive-d3dc26585e68115-20201020T225909470</str>
<date name="dateModified">2020-10-26T19:55:50.633Z</date>
</doc>
<doc>
<str name="id">ess-dive-d3dc26585e68115-20201020T225909470</str>
<str name="seriesId">doi:10.15485/1603775</str>
<str name="obsoletes">ess-dive-d3dc26585e68115-20200515T150109138</str>
<date name="dateModified">2020-10-20T22:59:17.056Z</date>
</doc>
<doc>
<str name="id">ess-dive-d3dc26585e68115-20200515T150109138</str>
<str name="seriesId">doi:10.15485/1603775</str>
<str name="obsoletes">ess-dive-d3dc26585e68115-20200515T142150185</str>
<str name="obsoletedBy">ess-dive-d3dc26585e68115-20201020T225909470</str>
<date name="dateModified">2020-10-20T22:59:16.46Z</date>
</doc>
<doc>
<str name="id">ess-dive-3b724d6bdb052b0-20200309T183829430</str>
<str name="seriesId">doi:10.15485/1603775</str>
<str name="obsoletes">ess-dive-6a50c611c8833d6-20200306T184249151</str>
<date name="dateModified">2020-03-12T23:03:46.683Z</date>
</doc>
</result>

The pid chain for this seriesId shows 3 unobsoleted pids, and an incomplete chain. Performing the same query on ESS-DIVE shows a complete series with only the most recent pid un-obsoleted. Also, the top pid is more recent that the one on DataONE search. I'll just show the most recent pid for brevity:

https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/query/solr/?q=seriesId:%22doi:10.15485/1603775%22&q.op=AND&fl=id,seriesId,dateModified,obsoletes,obsoletedBy&sort=dateModified%20desc
<result name="response" numFound="8" start="0">
<doc>
<str name="id">ess-dive-d3dc26585e68115-20201204T142120400</str>
<str name="seriesId">doi:10.15485/1603775</str>
<str name="obsoletes">ess-dive-d3dc26585e68115-20201204T141934245</str>
<date name="dateModified">2020-12-04T14:21:26.097Z</date>
</doc>

This doesn't address all issues raised by @mburrus, but I would think that this sync/index to the CN issue needs to be resolved first.

gothub commented 3 years ago

Regarding the incomplete pid chain on the CN, this appears to be an indexing issue, as the complete set of pids in the series do have current metadata on the CN (I checked each pid in the series as it was reported from ESSDIVE Solr).

gothub commented 3 years ago

The pids in the series doi:10.15485/1603775 have been re-indexed on the CN.

vchendrix commented 2 years ago

@gothub This still seems to be an issue currently on production doi:10.15485/1603775. Here is what the metrics look like today. Screen Shot 2022-03-30 at 8.14.20 AM.png

rushirajnenuji commented 2 years ago

Hi @vchendrix @gothub - I found a bug in the PID resolution algorithm and have added a fix (the above 2 commits). I tested this by deploying a test Metrics API on logproc server and pointing test MetacatUI (handy-owl) to it and have verified that this fixes the issue. Deployed here (to test search for the dataset’s title WHONDRS Consortium. The metrics on the Data Catalog page and Dataset landing page now add up).

Next steps: Merge the branch to master; prepare a release, and deploy. I plan to have this live in production sometime tomorrow.

vchendrix commented 2 years ago

Thanks @rushirajnenuji. Looking forward to the deployment to production!

rushirajnenuji commented 2 years ago

This issue was resolved with the latest Metrics Service release and was deployed to production on April 04, 2022.