eXtensibleCatalog / Drupal-Toolkit

The eXtensible Catalog Drupal Toolkit
0 stars 0 forks source link

Admin interface for record counting #311

Open patrickzurek opened 7 years ago

patrickzurek commented 7 years ago

JIRA issue created by: pkiraly Originally opened: 2011-06-21 03:49 PM

Issue body: (nt)

This issue has attachments associated with it (external links): marctoxctransformation.txt metadata_stat_0.png metadata_stat_2.png metadata_stat_3.png stats.png

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-06-21 03:49 PM

Comment body:

A helper screen for the administrator to follow and analyse the result of harvesting. The admin shuld revisit these informations:

The optimal format would be a table. It should contain some check calculations based on the following principle: original + incoming = current.

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2011-06-21 07:50 PM

Comment body:

Attached is a sample of the Transformation Service Logs. There are lots of possible numbers to compare and I am not sure which you will be able to use or find useful.

At a minimum, you will want to always be able to reconcile all the active counts by record type, e.g. e-active: 2,809,491
h-active: 2,682,540
m-active: 2,631,579
w-active: 2,809,388 total-active: 10,932,998

This includes in both MySQL and Solr.

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-06-21 08:11 PM

Comment body:

Randall, what do you mean by "active"?

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2011-06-21 08:16 PM

Comment body:

A non deleted record. The records that should be available for discovery.

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-06-22 05:28 PM

Comment body:

I made an initial version. It shows the state of database (we have a distinct UI for Solr index informations, but later I will add some info about Solr as well).

Here you can find 3 columns: the state before last harvesting, the stat for harvest itself, and the current (post-harvest) state. Some properties are available in the harvest column some only in the before and after columns. I hope that the properties are self explanatory, but let me know if its not the case.

The three images I attached show three snapshot: 1: there is no record in the DT 2: we run a successfull harvest 3: we run a harvest which did not produced any records (it is possible because of the incremental nature of OAI--PMH harvest).

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-06-22 05:30 PM

Comment body:

I forget something: if a number is green it means, that it is checked somehow. E.g. the number of entities should be equal to works + expressions ...

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2011-06-22 05:30 PM

Comment body:

If a number is in red it means, that this checking was failed.

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2012-03-08 06:55 PM

Comment body:

Attached image shows the result of an incremental harvest.

A few thoughts...you are showing "0"s in the harvested column, even though the "before harvest" and the "current" columns are different. You should at a minimum show the net change in the column (e.g. current harvest added 5 works or removed them).

I also wonder if you can borrow an approach from the MST logs. The MST shows --incoming record counts for the current processing batch --total incoming counts all time --outgoing record counts for the current processing batch of records --outgoing record counts all time

This keeps a running Hx of each process. This screen shows the result of just the last harvest, right? Is there more data record elsewhere.

Notify Dave for input/specs/requirements.

patrickzurek commented 7 years ago

JIRA Comment by user: pkiraly JIRA Timestamp: 2012-04-13 04:12 PM

Comment body:

Yes, this table shows information only about the current harvest. To create a full list of all harvest or cumulative numbers is too early I thing in this phase of the software. The problem is, that during development we use not only the tools the software provides, but external tools to remove records (like issuing SQL commands). The numbers generated from this external tools (or not generated at all) is not recorded inside the software. For example this week, when I worked on the metadata issue, I run full harvest at least 6 times. Each harvested 10+ million records. If these numbers displayed in this screen, we reported something like that we harvested 100 million records, but have only 10 million. When the software will be quite stable in this field, and we will have experience with maintaining a long time daily process, we can build this funtion. An alternative approach would be a "reset" button, which would reset the cumulative numbers if we would like to start harvesting from zero point.

"A few thoughts...you are showing "0"s in the harvested column, even though the "before harvest" and the "current" columns are different." The problem is, that the deleted records have only the OAI header part, which doesn't have information about the type of the record. To get the type involve some more steps in the process. It can be done, I didn't do that at first time.

There is another issue to solve: the current "per harvest report" doesn't contain this type of statistics. It should be included.

patrickzurek commented 7 years ago

JIRA Comment by user: rcook JIRA Timestamp: 2012-09-12 03:41 PM

Comment body:

Peter, is this the work you were describing on our call? Can you expand on it.