NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Enhance provenance sweeper with more graceful failure when registry contains zero documents #25

Closed alexdunnjpl closed 7 months ago

alexdunnjpl commented 1 year ago

Update to safeguard against DivideByZero

al-niessner commented 7 months ago

@alexdunnjpl @jordanpadams

Is there a stack trace or something to go this this? I am not seeing any division problems.

alexdunnjpl commented 7 months ago

@al-niessner if you've run sweepers against a completely-empty registry and aren't able to replicate, possibly this was fixed and the issue not closed, or fixed as a side-effect of something else, in which case this issue can be closed without further action.

al-niessner commented 7 months ago

@alexdunnjpl @jordanpadams

On a completely empty opensearch (no registry) get a bunch of SSL messages because of self signed open cert (who cares it is local testing) and then a good error telling you that there is no registry:

$ PYTHONPATH=/home/niessner/Projects/PDS/registry-sweepers/src python3 src/pds/registrysweepers/provenance/__init__.py -b https://localhost:9200 -p admin -u admin --insecure
/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/connection/http_urllib3.py:199: UserWarning: Connecting to https://localhost:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
Traceback (most recent call last):
  File "/home/niessner/Projects/PDS/registry-sweepers/src/pds/registrysweepers/provenance/__init__.py", line 149, in <module>
    run(
  File "/home/niessner/Projects/PDS/registry-sweepers/src/pds/registrysweepers/provenance/__init__.py", line 75, in run
    successors = get_successors_by_lidvid(extant_lidvids)
  File "/home/niessner/Projects/PDS/registry-sweepers/src/pds/registrysweepers/provenance/__init__.py", line 90, in get_successors_by_lidvid
    extant_lidvids = list(extant_lidvids)  # ensure against consumable iterator
  File "/home/niessner/Projects/PDS/registry-sweepers/src/pds/registrysweepers/utils/db/__init__.py", line 67, in query_registry_db
    results = retry_call(
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/home/niessner/Projects/PDS/registry-sweepers/src/pds/registrysweepers/utils/db/__init__.py", line 52, in fetch_func
    return client.search(
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/client/utils.py", line 179, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/client/__init__.py", line 1553, in search
    return self.transport.perform_request(
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/transport.py", line 409, in perform_request
    raise e
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/transport.py", line 370, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/connection/http_urllib3.py", line 266, in perform_request
    self._raise_error(
  File "/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/connection/base.py", line 301, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
opensearchpy.exceptions.NotFoundError: NotFoundError(404, 'index_not_found_exception', 'no such index [registry]', registry, index_or_alias)

Did we want to make that error more ambiguous somehow?

al-niessner commented 7 months ago

Then with an opensearch that has no data (created with registry/docker docker compose --profile=dev-api up):

$ PYTHONPATH=/home/niessner/Projects/PDS/registry-sweepers/src python3 src/pds/registrysweepers/provenance/__init__.py -b https://localhost:9200 -p admin -u admin --insecure
/home/niessner/.venv/pds/lib/python3.10/site-packages/opensearchpy/connection/http_urllib3.py:199: UserWarning: Connecting to https://localhost:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
/home/niessner/.venv/pds/lib/python3.10/site-packages/urllib3/connectionpool.py:1095: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
(pds) niessner@elysium:~/Projects/PDS/registry-sweepers$ 

Same unimportant self-signed warnings but no other errors. If this is sufficient for you two, then I will let you close it without further adieu.

alexdunnjpl commented 7 months ago

@al-niessner looks like you're just running provenance there, not the entire set of sweepers?

Should be this'n: https://github.com/NASA-PDS/registry-sweepers/blob/main/docker/sweepers_driver.py

al-niessner commented 7 months ago

@alexdunnjpl

Yes, because the subject specifically says provenance sweeper. No need to run/test others.

alexdunnjpl commented 7 months ago

@al-niessner my mistake - I'm so used to mentally translating between the two because people still refer to the sweepers suite as "provenance". Probably I wasn't doing that in the ticket title, but I can't be absolutely certain, so it may be best to run the full suite for completeness' sake given that it's not replicable via just the provenance script. Your call though.

al-niessner commented 7 months ago

@alexdunnjpl

Not my call. I am just working on what was stated in the ticket. If the requirements need to be changed, then change them (fix the subject). You can also just state that the subject is wrong and they all need to be done (may be best for completeness sake is not stating it). I am used to requirements changing but it costs money when they change; so, whoever changes it has to take clear responsibility for added costs.

alexdunnjpl commented 7 months ago

@al-niessner this ticket was basically a quick note-to-self from a few months ago that I'd forgotten the empty-registry corner case, so that I'd remember to loop back to it (hence the flippant original text). Something something good intentions...

Because of that, I can only speculate whether I made a mistake in the initial ticket subject. I'll take a look now and see whether I can replicate it.

alexdunnjpl commented 7 months ago

I couldn't replicate it with the full-suite driver, so safe to say it's no longer a valid issue.