biocaddie / prototype_issues

Used to report and track bioCADDIE prototype issues
3 stars 5 forks source link

Recognizing Repositories in metadata from Aggregators #48

Open altergc opened 8 years ago

altergc commented 8 years ago

When DataMed displays data from an aggregator, the display should recognize the repository that the data were harvested from. Here is an example:

If you perform the search "phdcn mental health services" on the DataMed search, the lead result is "Project on Human Development in Chicago Neighborhoods (PHDCN): Mental Health Services, Wave 3, 2000-2002" See http://datamed.biocaddie.org/search.php?query=phdcn+mental+health+services&searchtype=data

This is a dataset located at ICPSR, which was harvested by the Harvard Dataverse.
DataMed incorrectly lists Dataverse Network as the "Repository" and Havard University as the "Organization". These should be ICPSR and the University of Michigan. See
http://datamed.biocaddie.org/display-item.php?repository=0012&idName=dataset.title&id=56d4b87be4b0e644d3134700&query=phdcn%20mental%20health%20services

However, the Dataverse Network itself recognizes that the data are at ICPSR and not at Harvard. If you do the same search on the DataPASS catalog (http://www.data-pass.org/), which is built on the Dataverse platform, there is an explicit recognition that ICPSR is the owner. See https://dataverse.harvard.edu/?q=phdcn%20mental%20health%20services

Recognition is an important issue for repositories. If DataMed is going to use aggregators, it needs to be able to recognize that the aggregator is not the repository.

tjohnson250 commented 8 years ago

This is related to the harvested dataset issue. Can we merge these and come up with a general solution?

altergc commented 8 years ago

The two issues are very similar. The difference is between harvesting of data and harvesting of metadata. However, I don't agree with your (Todd's) suggestion about handling the situation. You seem to suggest giving equal status to both the original and the harvested version. I recommend that harvested versions should always point to the original. This is actually the recommendation in the original issue report, which points out that the metadata with the harvested copy in Dryad is not as rich as the metadata with the original in KNB.