cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

Improve logging to help determine quality of harvested metadata #91

Closed cessda-bitbucket-importer closed 4 years ago

cessda-bitbucket-importer commented 5 years ago

Original report on BitBucket by John Shepherdson (GitHub: john-shepherdson).


Check for presence of required fields, consistence of field values etc. Report issues back to SPs (via Metadata Office).

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Need to do more work to understand how to use Kibana

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Check for presence of required fields, consistence of field values etc. Report issues back to SPs (via Metadata Office).

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Any obvious deficiencies are being reported to the Metadata Office via the issue tracker (https://github.com/cessda/cessda.metadata.officeissues)

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Better logging. Source, field, problem. Compatibility with Graylog. Use CDC DDI profile as pre-check?

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


See also #11

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Improve logging output Estimate of effort required to diagnose and fix: 1 day (CONTRACTOR)

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


@john-shepherdson your first comment on this ticket is:

Need to do more work to understand how to use Kibana

However, I see you now mentioned Graylog as a similar tool to Kibana a log aggregator/dashboard. Is it correct to say Graylog supersedes Kibana? Note I have not used Graylog before so I would need to do some research on it to meet the compatibility you mentioned here. But I will expected every sort of log aggregator/service to handle pretty much most logs and have custom tools on it’s platform to tell it how to interpret logs that it cannot automatically interpret rather than having knowledge of Graylog in ones application (Vendor lock!).

Better logging. Source, field, problem. Compatibility with Graylog. Use CDC DDI profile as pre-check?

Check for presence of required fields, consistence of field values etc. Report issues back to SPs (via Metadata Office).

I’m interpreting the above two comments as:

  1. You want me to log every study document that is is being passed by the Indexer (this would include all fields)

  2. Run the document against a custom JsonSchema or logic to check for “Check for presence of required fields” and log errors found

    1. Please confirm required fields?
  3. consistence of field values” this is the job of a log aggregator/dashboard and cannot be done by the Indexer

Note

Please confirm my understanding of this ticket above is correct.

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


@john-shepherdson Awaits your response. Also if we are to use the proposed external study validator service we discussed this work would be redundant.

Assigning to you.

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Graylog is now being used instead of Kibana, but to so extent that is irrelevant here.

Agree that there are potential performance issues, so may need to make this modal.

When I look at Springboot logs and see error messages, it can be difficult to know which endpoint and/or which record is causing the error. Some examples from https://datacatalogue-dev.cessda.eu/admin/#/applications/e3783374/logfile which may not be covered by external validator:

2019-12-23 16:19:22.792 WARN DefaultHarvesterConsumerService.java:87) - Exception msg[Unsuccessful response from remote repository.]. External system response body[{"message":"InternalSystemException: Unable to parse xml :Error on line 19033: Attribute name \"w\" associated with an element type \"location\" must be followed by the ' = ' character."}]

2019-12-23 16:22:31.746 ERROR (LogHelper.java:49) - RemoteResponse(logLevel=ERROR, responseCode=406, responseMessage=Not Acceptable, occurredAt=2019-12-23T16:22:31.746242)

2019-12-23 16:41:36.346 WARN DefaultHarvesterConsumerService.java:87) - Exception msg[Unsuccessful response from remote repository.]. External system response body[{"message":"InternalSystemException: Unable to parse xml :Error on line 3213: The processing instruction target matching \"[xX][mM][lL]\" is not allowed."}]

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Agree. I will

  1. Improve these listed here and
  2. Review some of the logs I see in dev and improve on any logs I find that do not easily make easily identifiable The Study Record Number | SP endpoint | Configuration

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Improved logs see PRs here @john-shepherdson

  1. [link to pull request removed](link to pull request removed)
  2. [link to pull request removed](link to pull request removed)
  3. [link to pull request removed](link to pull request removed)

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Assigning to you @john-shepherdson Please review logs on a full re-index and feedback if you need more or less verbose logs information.

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


First cut inspection of incremental reharvest shown more info re source of error (endpoint - SP and type).

I need to check that records numbers are also present when parsing errors occur .

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Agree. By the way are you running the apps locally or you have future deployments working as the PRs here are still to be merged to master branch. Though I have updated some of the loggings alongside other tickets. The changes on these PRs significantly improves details of said Service Provide endpoint urls and study Identifiers involved in errors.

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


@john-shepherdson

I’ve reworked the tests to increase coverage around logging activities and merged all branches to master for a faster feedback. Feel free to re-run a full re-ingestion

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Please include endpoint details in following messages:

pasc-osmh-handler-oai-pmh:

2019-12-30 11:56:34.017 ERROR (getDocument) (ListRecordHeadersServiceImpl.java:193) - Unable to parse repo RecordHeader response bytes.

pasc-osmh-handler-nesstar:

2019-12-30 12:04:00.844 ERROR (ListRecordHeadersServiceImpl.java:105) - Unable to parse repo RecordHeader response bytes.

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Done

PRs that adds in fullListRecordUrlPath logging and modifications to allow so.

@john-shepherdson I’ll be merging this now next so I know full state of play of sonar before ending the day and this phase of iteration fixes and improvements.

cessda-bitbucket-importer commented 4 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Please include endpoint details in following messages:

Fix verified and working on DEV

handler-oai-pmh

handler Nesstar

To be helpful here is the actual response page if I try to manually access that url.

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Signed-off as completed