Open hew503 opened 8 years ago
Hello, After thourough checking, I found the two xml files harvested to create the two different packages containing the two resources you mention. (You can see them here http://more.locloud.eu:8080/objects/3025/11460856/Archsearch/content and http://more.locloud.eu:8080/objects/2858/10354002/Archsearch/content) The specific resource exists in both files, that is why you see it twice. I can send you these two files if you want.
Hi Eleni
Probably the best thing to do is to check with Dimitra - Nefeli on this. My understanding is we sent the ArchSearch metadata to ATHENA as an XML file, but at 1.3 million records, ATHENA broke the file up into multiple uploads so that MORe could handle it. I'm not sure how that was done, and why it resulted in duplication, but presumably that's where the problem is. I'm not sure if its easier for ATHENA to go back to the original XML we sent and try to break it up without causing duplication, or try to remove the duplicates, but as ADS wasn't involved in the upload, we don't really know how things were done in MORe.
Many thanks
Holly
Hello again, Unfortunately, Dimitra-Nefeli has left from DCU (Leonidas l.papachristopoulos@dcu.gr is in charge of the content now), but she informed me that we have received the data already broken in folders. I am forwarding you the responding email. Maybe the duplicates were caused in the OAI-PHM harvesting of ALL sets when the folders were created?
There are duplicates within the ArchSearch: ADS catalogue resource. The duplication doesn't seem to be within our metadata, but its hard to show that since the upload was so massive it had to be divided into more than a dozen files in MORe. Everything isn't duplicated, so perhaps there was overlap when it was divided? That's all I can think of.
Here is an example:
http://portal.ariadne-infrastructure.eu/page/10354002 http://portal.ariadne-infrastructure.eu/page/11460856
Is it possible to search for duplicate Original IDs within the ArchSearch: ADS catalogue resource, and fix it that way?
cheers
Holly