NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Synchronization of Legacy Solr Registry in the new #68

Closed tloubrieu-jpl closed 9 months ago

tloubrieu-jpl commented 10 months ago

🗒️ Summary

A new sweeper is added. It can be optionally launched in addition to the default sweepers with option --legacy-sync

⚙️ Test Data and/or Report

Build the docker image.

Run on en-delta:

docker run -e CCS_CONN='atm-delta'  -e PROV_ENDPOINT='https://search-en-delta-cxbnln2tudbdrclt4nr7scdmpu.us-west-2.es.amazonaws.com' -e PROV_CREDENTIALS='{"*******": "*******"}' registry-sweepers /usr/local/bin/sweepers_driver.py --legacy-sync

Run in production:

docker run -e CCS_CONN='atm-prod-ccs,geo-prod-ccs,img-prod-ccs,naif-prod-ccs,ppi-prod-ccs,psa-prod,rms-prod,sbnpsi-prod-ccs,sbnumd-prod-ccs' -e PROV_ENDPOINT='https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com' -e PROV_CREDENTIALS='{"*****": "*********"}' registry-sweepers /usr/local/bin/sweepers_driver.py --legacy-sync

♻️ Related Issues

Fixes #58

nutjob4life commented 10 months ago

@tloubrieu-jpl in the pull request description the password ****** appears. Is this the actual password?

If so, it should be changed as soon as possible!

tloubrieu-jpl commented 10 months ago

@alexdunnjpl @nutjob4life I made updates to the PR following your comments.

I applied the linter on the code and had to add comments to make the linter ignore 3 lines.

I also noted that the unit tests are broken but I don't see how this pull request would have an impact on them.

alexdunnjpl commented 10 months ago

@tloubrieu-jpl I haven't looked closely at the rest yet, but L520 through L531 show the from the test details/log show the issue

https://github.com/NASA-PDS/registry-sweepers/actions/runs/6103467062/job/16563918148?pr=68#step:5:521

There should not be any changes required to the two utilities files - I'd suggest reverting those (given that they happen to be the files referred to in the failing test stage)

tloubrieu-jpl commented 9 months ago

Hi @alexdunnjpl , those changes were made to go through the linter (tox -e lint). which work but the unit tests (pytest) don't.

Have you had the precommit configuration deployed in your repository (see pre-coomit commands in https://github.com/NASA-PDS/template-repo-python#installation) ?

Another question, how do you run the unit tests, I mean with which command ?

alexdunnjpl commented 9 months ago

@tloubrieu-jpl both the tests and the linter are/should-be run through tox, rather than pytest.

The tests may be run via the (somewhat unintuitively-named) py310 tox profile, so tox -e py310, and are not currently failing, as can be confirmed from examination of the github action logs.

The changes under registrysweepers.db.* are not required to pass linting and should be reverted (I have just tested this to confirm).

The outstanding (linting) errors are as follows, and will require either installation of the library stubs, or appropriate #ignores

src/pds/registrysweepers/legacy_registry_sync/opensearch_loaded_product.py:3: error: Cannot find implementation or library stub for module named "elasticsearch"  [import]
src/pds/registrysweepers/legacy_registry_sync/legacy_registry_sync.py:4: error: Cannot find implementation or library stub for module named "elasticsearch.helpers"  [import]
src/pds/registrysweepers/legacy_registry_sync/legacy_registry_sync.py:4: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
src/pds/registrysweepers/legacy_registry_sync/legacy_registry_sync.py:4: error: Cannot find implementation or library stub for module named "elasticsearch"  [import]

My precommit hook (slightly modified so that it re-runs on autofixable failure) is as follows:

#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments.  The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "pre-commit".

if git rev-parse --verify HEAD >/dev/null 2>&1
then
        against=HEAD
else
        # Initial commit: diff against an empty tree object
        against=$(git hash-object -t tree /dev/null)
fi

# If you want to allow non-ASCII filenames set this variable to true.
allownonascii=$(git config --bool hooks.allownonascii)

# Redirect output to stderr.
exec 1>&2

# Cross platform projects tend to avoid non-ASCII filenames; prevent
# them from being added to the repository. We exploit the fact that the
# printable range starts at the space character and ends with tilde.
if [ "$allownonascii" != "true" ] &&
        # Note that the use of brackets around a tr range is ok here, (it's
        # even required, for portability to Solaris 10's /usr/bin/tr), since
        # the square bracket bytes happen to fall in the designated range.
        test $(git diff --cached --name-only --diff-filter=A -z $against |
          LC_ALL=C tr -d '[ -~]\0' | wc -c) != 0
then
        cat <<\EOF
Error: Attempt to add a non-ASCII file name.

This can cause problems if you want to work with people on other platforms.

To be portable it is advisable to rename the file.

If you know what you are doing you can disable this check using:

  git config hooks.allownonascii true
EOF
        exit 1
fi

STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep ".py$")

echo "Linting files"
tox -e lint
git add $STAGED_FILES
if [ $? -ne 0 ]
then
    echo "Initial lint detected errors, re-linting to determine whether errors remain"
    tox -e lint
    if [ $? -ne 0 ]
    then
      exit 1
    fi
fi

exit 0
tloubrieu-jpl commented 9 months ago

@alexdunnjpl @nutjob4life I made some changes to make tox work.

  1. namespace_packages is deprecated in setup.cfg: I renamed the package pds.registry-sweepers and removed the line in the setup.cfg.
  2. I changed the option to allow pytest as an external tool for tox.

Can you review these changes and validate if you are happy with them ?

tloubrieu-jpl commented 9 months ago

Hi @nutjob4life , can you validate that your requested changes are done ? Thanks

nutjob4life commented 9 months ago

Good morning @tloubrieu-jpl, this all checks out 🎉