calvez / xcoaitoolkit

Automatically exported from code.google.com/p/xcoaitoolkit
0 stars 0 forks source link

Updated records are not passed via harvest- CRITICAL ISSUE #42

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
While doing my testing for 10 bibs/20 holdings and their IDS across the
entire XC system it turns out we discovered that the OAI Toolkit does not
update the modified date (or some other issue)when records are
updated/deleted.  It appears the records are being updated but not tracking
the date.  Records were loaded, then harvested by the MST on 2/18/2010. 
The OAI Toolkit had updated records passed through on 2/19.  Reharvesting
from MST on 2/23 yields not updates.

Comment from Shrey via Skype

[11:34:26 AM] Shreyansh:  neither the LuceneImporter, mySQLImporter or
MixedImporter updates the modification date field when a record is
updated/deleted

[11:40:43 AM] Shreyansh: I am wrong on saying, it does not modify the
modification_date. It does, I think now this bug needs to be debugged
[11:40:47 AM] Shreyansh: where the problem is.

Original issue reported on code.google.com by rc...@library.rochester.edu on 23 Feb 2010 at 4:43

GoogleCodeExporter commented 9 years ago
Attached a doc describing the issue as I understand it.  Please advise/correct 
anything.

Original comment by rc...@library.rochester.edu on 24 Feb 2010 at 8:44

Attachments:

GoogleCodeExporter commented 9 years ago
Issue has been resolved. The problem was if the "until" parameter was 
null/blank in 
the OAI request, the OAI Toolkit took a date which was in EST time zone and not 
UTC-
format.

It has been changed so that now, it would be in UTC format, and therefore would 
not 
cause this problem.

It would be there in the 0.6.4 version of the OAI Toolkit.

Original comment by sva...@library.rochester.edu on 25 Feb 2010 at 8:26

GoogleCodeExporter commented 9 years ago
Issue status changed due to a problem discussed by Randy, Sharmila and Peter 
More 
description to follow. It is on the lines that "we should not be using the 005 
in 
any part of XC for anything".

Original comment by sva...@library.rochester.edu on 25 Feb 2010 at 8:33

GoogleCodeExporter commented 9 years ago
Just to clarify.  The issue was discovered during my testing of a small set of
records.  At that time, we discovered that the OAI Toolkit was basing it's OAI
responses for when records were last modified on the 005.  This field is not the
date/time of last modification of the record in the OAI repository.  The 005 is
applicable only to the ILS and even then is often an unreliable field as it is 
not
used consistently by an ILS or across ILSs.    The process used by the OAI 
Toolkit is
flawed.  The 005 has nothing to do with when the record was updated in the OAI
repository.  See my attached document for an example.

The UTC time zone stamp issue was discovered as a separate but related issue 
when we
were trying to manually change the time stamp of the 005 (so that my testing 
could
continue).  

Original comment by rc...@library.rochester.edu on 25 Feb 2010 at 8:42

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Randall,
Isn't the toolkit supposed to _always_ set the modification date to the current
time/date?  If so, then I think this simple patch does just that.  I'm new to 
the
code so I would rather someone else approve it and/or suggest an alternative.  
I did
run luke on the index and the modification_date fields are now always being set 
to
current time/date.

BTW, I also noticed that after an initial load some records had create_date 
fields
set and others didn't.  I'm thinking that the isNew() method might be flawed 
since I
would expect that all records should have this set on an initial load (since 
they are
all new), right?

I have not yet taken a look at how the OAI server manipulates the 
modification_date
field.  It's still possible that it might be looking at the 005 (ignoring the 
Lucene
index); I need to look further at the code.

Chris

Original comment by chic...@gmail.com on 12 Mar 2010 at 6:17

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by chic...@gmail.com on 22 Mar 2010 at 4:50

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r64.

Original comment by chic...@gmail.com on 23 Mar 2010 at 4:36

GoogleCodeExporter commented 9 years ago
Changing status to "not released".  When we release the next version of the 
software,
this will be included.   We found that collecting all fixed issues for a 
release in
this manner allowed us better visibility for writing release for the X # of 
issues
covered by a release.

I would also like to have this tested by using the data that allowed us to 
discover
the issue.   Sharmila, can you do that in some way (e.g. can this patch be 
applied
without a fulll release etc.)?  

I seem to think that part of the issue involved the incorrect use of 005 date, 
not
just that the repository "modified date" was set incorrectly.  Does the patch 
address
that as well?

Original comment by rc...@library.rochester.edu on 23 Mar 2010 at 6:54

GoogleCodeExporter commented 9 years ago
The patch simply sets the modification_date in lucene to current timestamp.  
When
searches are performed (e.g., ListRecords), the OAI server compares against 
(e.g.,
from=2010-03-22T14:00:37Z) this lucene field.  Now that this field is set 
correctly,
the correct set of records now get returned.

Original comment by chic...@gmail.com on 23 Mar 2010 at 7:05

GoogleCodeExporter commented 9 years ago
The updated/deleted record's modification_date are updated with EST. It should 
be in
 UTC(http://www.openarchives.org/OAI/openarchivesprotocol.html#Dates).

Original comment by srangana...@library.rochester.edu on 24 Mar 2010 at 8:17

GoogleCodeExporter commented 9 years ago
modification_date is not in UTC

Original comment by chic...@gmail.com on 25 Mar 2010 at 3:29

GoogleCodeExporter commented 9 years ago
OAI toolkit's updated & deleted records are passed successfully to MST.

Original comment by srangana...@library.rochester.edu on 25 Mar 2010 at 8:30

GoogleCodeExporter commented 9 years ago
BTW, my comment #13 was _supposed_ to read "now in UTC." Funny how one 
character can
change the whole meaning :-)

Original comment by chic...@gmail.com on 25 Mar 2010 at 9:27

GoogleCodeExporter commented 9 years ago

Original comment by srangana...@library.rochester.edu on 14 May 2010 at 9:58

GoogleCodeExporter commented 9 years ago
Released in version 0.6.5

Original comment by srangana...@library.rochester.edu on 14 May 2010 at 10:01