datacite / lupo

DataCite REST API
https://api.datacite.org
MIT License
11 stars 8 forks source link

Citations list differ for the same DOI when calling directly the resource and calling the listing the resources from other type. #637

Open kjgarza opened 4 years ago

kjgarza commented 4 years ago

Expected Behaviour

Current Behaviour

Citations list differ for the same DOI (e.g. 10.48321/d1h59r) when calling directly the resource and calling the listing the resources from other type.

e.g. 10.48321/d1h59r shows 3 citations


{
  dataManagementPlan(id: "10.48321/d1h59r") {
    partOf{
      nodes{
        id
      }
    }
        citations(resourceTypeId:"Dataset"){
      nodes{
        id
        type
      }
    }
  }
}

image

but when calling the organization with the same doi 10.48321/d1h59r one gets zero citations

{
  organization(id:"03yrm5c26") {
    name
    dataManagementPlans {
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        citations {
          totalCount
          nodes {
            id: doi
            name: titles(first: 1) {
              title
            }
          }
        }
      }
    }
  }
}

image

Steps to Reproduce

  1. step1
  2. step2
  3. step3
  4. step4

Context (Environment)

Hypothesis

Detailed Description

Possible Implementation

kjgarza commented 4 years ago

counts are also wrong in the Index look at citationCount and citations.totalCount

image

citations.totalCount https://github.com/datacite/lupo/blob/52c6e7a/app/graphql/connections/hash_connection.rb#L58 https://github.com/datacite/lupo/blob/52c6e7a/app/graphql/types/doi_item.rb#L426

citationCount https://github.com/datacite/lupo/blob/52c6e7a/app/models/doi.rb#L1409

kjgarza commented 4 years ago

it seems to be all DOIs need to be updated after the citations events are created. It's also not happening automatically.

kjgarza commented 4 years ago

The difference in the counts is related to that there are differences between the DOI in the Database and the DOI in the Index. for example 10.1139/CJFR-2018-0211 was in the DB table but not in the index. So when one is querying for citations in graphQL the API queries the index. Event if the citations id are correctly collected the counts will be wrong if the citation DOI is not in the DOI index. This can happen when:

kjgarza commented 4 years ago

I think we should address each of the problems related to counts separately. I was thinking to create github issues but in a sense this issues already exist. For (B), there are multiple issues on indexing in which we are fixing this manually. See tickets from front. And w/r/t (A) Crossref DOIs there are multiple issues that seek to have a more complete list of DOIs in the index in https://github.com/datacite/datacite/issues/1082, https://github.com/datacite/datacite/issues/1090 and https://github.com/datacite/datacite/issues/1091.

I would suggest to keep this issue open to track progress on the other tasks. What do you think @mfenner ?

mfenner commented 4 years ago

I suggest to create a new issue to track these various issues and make it an epic. This issue is about a specific problem, differences in citations shown depending on how the dataManagementPlan is received.

Epic created at https://github.com/datacite/lupo/issues/649.

mfenner commented 4 years ago

This issue can no longer be reproduced.

kjgarza commented 3 years ago

This is still open, one can see the counts still do not match

{
  dataManagementPlan(id:"10.48321/d1h59r") {
    citationCount
        citations(resourceTypeId:"Dataset"){
      nodes{
        id
        type
      }
    }
  }
}

image

{
  works(query: "10.48321/d1h59r") {
    nodes{
      citationCount
      citations{
        totalCount
        nodes{
          id
        }
      }
    }
  }
}

image

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

digitaldogsbody commented 1 year ago

Not stale