NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
28 stars 13 forks source link

Duplicates between organizations and more than one organization listed as a owner #409

Closed mbjones closed 5 years ago

mbjones commented 6 years ago

Author Name: Callie Bowdish (Callie Bowdish) Original Redmine Issue: 2734, https://projects.ecoinformatics.org/ecoinfo/issues/2734 Original Date: 2007-01-18 Original Assignee: Chris Barteau


There are a number of organizations who participate in projects at NCEAS whos data is currently (or potentially) registered in the KNB data repoitory. This can be done either directly or through replication from their metacat servers. ESA, UCNRS, OBFS, PISCO and LTER are examples of this. If NCEAS projects register data that is connected to the other organizations there is room for duplication. There is also a concern that some of the organization will miss getting credit for the data or be unable to display the data for their own websites and/or skins or have the data package show up for their reports. ESA has the LSID included with the citation that adds to problems with the data package being registered that is "owned" by more than one organization.

ESA is starting to register data sets with their own metacat server and replicate it to the KNB metacat. Here is an example of a duplicate that has been created.

Smith F. . Macroecological database of mammalian body mass. nceas.196.3 (registered earlier)

Smith F. 2006. Macroecological database of mammalian body mass. ESA Data Registry: urn:lsid:esa.org:esa:19:3

(http://data.esa.org).

same citation listing in the KNB (view has no lisid information) Smith F. 2006. Macroecological database of mammalian body mass. esa.19.3

NCEAS and ESA data registration for the same Online Distribution Info location and in this case the same title come up when doing a search on the title. I know of three groups (2 are with the SB LTER) that are going to try to submit data papers to the ESA archives who already have data packages in the KNB. This poses the problem of having more duplicates.

Currently the organization field contains a specific name such as: Ecological Society of America, Organization of Biological Field Stations, University of California Natural Reserve System, and National Center for Ecological Analysis and Synthesis. We have views or web skins that will display those specific organizations data sets. This field is automatically generated if people use the skins to register data sets for those specific organizations.

The question is can we avoid duplication and have the different skins and organizations be able to generate views and reports specific to them. Will there be any problems with having more than one organization, who uses metacat, associated with the data package. For instance if UCNRS has a NCEAS postdoc doing research at their reserve can data sets created by this researcher be owned by both organizations.

Here is an example of what the contact section of the eml code might look like for a document that could be associated with more than one organization. If we encourage data packages owned by more than one organization to be listed in the eml file, will that help to prevent duplicates? Will it encourage data sharing.

One consideration, and complication, is that only the ESA site creates and displays a lsid (Life Science ID) along with having only one way replication. ESA registrations have been peer reviewed which can also potentially add more value to them and make them more easily cited. How does this factor in with data sets that are registered on KNB but are scheduled to be added to the ESA Archives data papers?

Here is an example of the eml section that could allow for more than one organization that could be searched on.

....

NCEAS 5600: Vazquez: Null models for specialization and asymmetry in plant-pollinator systems National Center for Ecological Analysis and Synthesis Kevin D. Lafferty USGS Channel Islands Field Station Marine Science Institute Research Ecologist
University of California Santa Barbara CA 93106 USA
(805) 893-8778 klafferty@usgs.gov
University of California Natural Reserve System

...

mbjones commented 6 years ago

Original Redmine Comment Author Name: Matt Jones (Matt Jones) Original Date: 2007-02-27T18:01:54Z


Very similar to #2228.

Really probably need an external annotation support to make this work.

mbjones commented 6 years ago

Original Redmine Comment Author Name: Redmine Admin (Redmine Admin) Original Date: 2013-03-27T21:21:09Z


Original Bugzilla ID was 2734