OpenTreeOfLife / clade-workshops

OpenTree / FuturePhy workshops
1 stars 4 forks source link

Duplicate studies #5

Open GZhang2 opened 8 years ago

GZhang2 commented 8 years ago

I created a study (Study ID: ot_601) based on Marvaldi et al (2002), which I later found out has already been curated under "Study ID: ot_263". Not sure how this happened. Didn't get any warning message.

kcranston commented 8 years ago

That should not be happening. We recently changed the tests for duplicates to disallow the creation of duplicate studies. Thanks for the heads up that we should revisit that code! I am just putting links to the two studies here to make debugging simpler:

jimallman commented 8 years ago

The DOIs for these studies don't match exactly (one is in URL form, the other is a "bare" DOI).

The version submitted by @GZhang2 is in the expected (URL) format, but the version submitted by @brunoasm is not. It looks like the latter was imported from TreeBASE several months ago, before our TreeBASE import code was corrected to use the URL format.

See related discussion here. It looks like we still need to sweep the studies in phylesystem to correct these DOIs. I'm not sure who has the best tools for that kind of batch operation, but I'm tentatively assigning this to @mtholder. :bowtie:

kcranston commented 8 years ago

Thanks for looking into this, @jimallman ! I was confused because I was pretty sure we were normalizing the DOIs on import, but forgot that we hadn't yet gone back through the legacy studies.

jimallman commented 8 years ago

Hm, I can't assign issues here. Let's see if Mark answers the bat-signal.

N.B. we might find other hidden duplicates among the older TreeBASE imports! If we can't make these changes in phylesystem soon, we might toughen up the duplicate-DOI test to recognize both formats. Meanwhile, you can also search the study list using the "bare" (minimal) DOI, as shown here:

screen shot 2016-02-19 at 2 35 51 pm

Apologies for the inconvenience!

jimallman commented 7 years ago

It looks like the latter was imported from TreeBASE several months ago, before our TreeBASE import code was corrected to use the URL format.

UPDATE: I was mistaken, the revision above patched oti queries for DOI, but not bare DOIs in imported studies. Hopefully @mtholder can use the doi2url function in fixing https://github.com/OpenTreeOfLife/peyotl/issues/138