eXtensibleCatalog / test

Testing
MIT License
0 stars 0 forks source link

Reconcile logs for deleted records #98

Closed patrickzurek closed 7 years ago

patrickzurek commented 7 years ago

JIRA issue created by: rcook Originally opened: 2012-05-02 12:36 PM

Issue body:

Trying to reconcile 6 deleted records between MST and Drupal. MST reports these 6 holdings records as updated deleted, not new deleted records. Peter says these records were never harvested, or seen by Drupal before. See screen shots.

http://192.17.55.226/xc-1.3-test/?q=admin/xc/harvester/schedule/5/batch/6

http://128.151.244.139:8080/MetadataServicesToolkit/st/serviceLog.action

[~jbrand] and [~mwesley] and [~cdelis]

This issue has attachments associated with it (external link): 6deleted.png 6deleted2.png 6deleted3.png

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2012-05-02 12:50 PM

Comment body:

I think maybe not an issue. This was 6 deleted holdings that were previously held. So that is why they appear the way they do, I think.

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2012-05-02 12:51 PM

Comment body:

I'd like to make it more precise. In Drupal we actually delete records, do not just flag to be deleted. So I don't know actually whether these records were deleted previously, or never been harvested at all. Both are possible.

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2012-05-02 12:54 PM

Comment body:

Can you clarify what does "held" means? It is flagged for deleted in MST? Since the OAI-PMH protocol only sends an ID of the record (for deleted ones), we can not know more about this record if we doesn't have this record inside Drupal. I prefer not to harvest "ghost" records, or records which is known that were previously already deleted.

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2012-05-02 12:55 PM

Comment body:

Actually, thinking about this more. If these are held holdings, then they have never been served up for harvesting. If we later delete the record, going from held holding to deleted holding, should these get served up?

patrickzurek commented 7 years ago

JIRA Comment by user: jbrand JIRA Timestamp: 2012-05-02 01:18 PM

Comment body:

I think you have it right Randy - does that mst log fragment show all we know? Or is there an additional counter showing breakout of type of record deleted record was before in the mst? (ideally, upd_del_prev_held_cnt?)

patrickzurek commented 7 years ago

JIRA Comment by user: jbrand JIRA Timestamp: 2012-05-02 01:21 PM

Comment body:

BTW, getting used to jira, my reply was directed at Randy's 1st comment, not his last comment. Currently, we serve up deleted records (header only) and active records but not held records. What we 'should' do may be changing, i.e. not serve up 'any' deleted record if 'from' is empty.

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2012-05-02 02:03 PM

Comment body:

John, yes, that is why I made my comment and have not put the screen shot in.

Also to clarify, this is a subsequent harvest, so the from param is not empty, it would have a from and an until, but MST nonetheless, has served up a deleted record that had never been harvested.

[~David Lindahl] adding Dave as a notify.

patrickzurek commented 7 years ago

JIRA Comment by user: jbrand JIRA Timestamp: 2012-05-02 02:13 PM

Comment body:

re; 'so the from param is not empty, it would have a from and an until, MST nonetheless, has served up a deleted record that had never been harvested.'

Some comments, not solutions: 1) I can see how this record may never have been harvestable. In which case we ideally should not serve it. With our current design we don't know that though. We only save the last state of the record (prev_state) and the current state. 2) We don't know who has harvested from us. I don't think we could track it if we wanted to given our design.

patrickzurek commented 7 years ago

JIRA Comment by user: mwesley JIRA Timestamp: 2012-07-30 04:55 AM

Comment body:

I think this issue is resolved. Drupal handles statistics in a slightly different way than the MST, because it does not keep deleted or held records. To do a calculation, you need to combine the following.

Drupal New = new_act_cnt + upd_act_prev_held_cnt + upd_act_prev_del_cnt Drupal Updated = upd_act_prev_act_cnt Drupal Deleted = upd_del_prev_act_cnt Drupal Deleted Unknown = upd_del_prev_held_cnt + upd_del_prev_del_cnt

Drupal doesn't deal with "currently" held records at all, so the following counts in the MST logs cannot be used to compare with Drupal: new_held_cnt upd_held_cnt upd_held_prev_held_cnt upd_held_prev_del_cnt upd_held_prev_act_cnt (although I have no idea how this change would be handled by Drupal, since there is no way Drupal would know that the record is being held)

The following counts cannot be used to compare the MST with Drupal because they sum other counts, some of which are not used by Drupal, and are therefore confusing: new_del_cnt upd_del_cnt upd_act_cnt

This should help in calculating the difference between MST logs and Drupal logs and checking for problems during harvesting.


As for the reconciliation, I'm moving everything to a single issue: Harvested records match MST at #DRUPAL-98 – from months of testing, I've concluded that the problem is with the harvested deletes of manifestation records not being deleted in Drupal after the harvest – although they are clearly marked for deletion. A fix is in the works.

patrickzurek commented 7 years ago

Issue resolved: 2012-07-30 04:55 AM