Closed emanuil-tolev closed 8 years ago
Currently what seems to happen is overwrite the article title (so Oopsie becomes Test Article) on output. IMO this is a good behaviour
Live job showcasing this (compare "download original" to "download results" versions to see the effect in action): https://compliance.cottagelabs.com/#urLMywKJkTtdTtDto
Deployed
This can take two forms I've identified so far:
These would all be identified as
since we issue an OR query to the cache, requiring only one of the identifiers to match - a usually reasonable assumption in the world of publishing. That is, if you have correct data.
IMO we should simply treat 0s as if they were blanks for the purposes of the cache lookup. At the time of writing the PMCID lookup, as an example, is
idents.pmcid !== undefined && idents.pmcid !== null && idents.pmcid.length > 0
. I think&& idents.pmcid !== '0'
can be added safely to prevent this particular problem.This is a convenience feature related to particular user workflows and how those users understand publishing (no PMID or PMCID == "0"). Ultimately there is no ambiguity here, so the fix is straightforward.
In this case, the Oopsie row will be identified as the Test Article row, since the PMID is the same. This is ambiguous.
Currently what seems to happen is overwrite the article title (so Oopsie becomes Test Article) on output. IMO this is a good behaviour - if all the compliance information related to Test Article, but the title still said Oopsie, they would look (to a human) like two distinct records, but the information would all be about Test Article. The overwriting makes it clear that it's all about Test Article.
I don't currently think we should take any action here, but FYI for both of you, since this is probably one of the most important areas where we could encounter erroneous data. I am also discussing cases like this with Wellcome, so they might ultimately have a different point of view on whether Lantern's behaviour needs changing here.