Closed tloubrieu-jpl closed 1 year ago
@tloubrieu-jpl or anyone: sorry for the late start. 1) Am I really installing on pdscloud-prod2? Shouldn’t I do this on pdscloud-gamma? 2) On ~pds4 on both machines, java -version says 1.8. Should I install my own jdk 1.11 in my local dir? Thanks
Hi @rchenatjpl , you don't look late to me.
1) I think we said you should install on pdscloud-prod1, but 2 works as well. You don't need any local access to databases. The access to OpenSearch is done through HTTP and OpenSearch is a service managed by AWS, hosted on a different system, harvest and registry-manager could be installed anywhere. You could install on pds-gamma but then you would need to configure the deployment to use a staging registry. I am not sure if one is accessible (@jimmie can you answer that). I am not sure if we need to do that right now so to me we can keep the initial plan to deploy on the production venue.
As a reminder, the different common EN venues on the diagram:
2) @c-suh do you know if jdk11 is available on the pdscloud-* machines ? Which version are you using for validate ? In case it is not available yet, you could deploy that in the PDS4 home folder. I don't feel lie we need to ask the SA's for that. For AWS deployments, I am leaning toward: anything that the SAs are letting us do, let's do it ourselves. Does that make sense ?
Thanks Richard,
Thomas
@tloubrieu-jpl jdk11 wasn't available on the pdscloud-* machines, so I've installed it on gamma to start and will have it on the others by the end of today. A note that installing it to where the existing java is was not possible because it required root access, so I did it to the pds4 home directory as you suggested above. Another note that I've also installed jenv
to manage multiple java versions (leaving the existing jdk8 as global and will make jdk11 the default in the directories that Richard will create/install), since I imagine that a few of our older tools might not play well with this java version.
OK, I shoved harvest and registry into pds4cloud-prod1:/usr/local/build11/. In the Description at the top of this page, I don't understand what this means: "in /usr/local/build11/ --> /usr/local/applications". Should I build a softlink in the latter? Hopefully, it doesn't matter.
@jimmie or @tloubrieu-jpl What's the URL for Kibana on pds4cloud-prod1? I did get the password. How does Kibana know where the registry sits?
The Opensearch dashboard (fka Kibana) is associated with the particular Opensearch domain. For the EN registry, this is at: https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/_dashboards
Thanks. So I assume something's not configured correctly:
[pds4@pdscloud-prod1 test]$ ls -l directories.xml -rw-r--r--. 1 pds4 pds 1866 Jun 16 15:07 directories.xml [pds4@pdscloud-prod1 test]$ pwd /home/pds4/test [pds4@pdscloud-prod1 test]$ /usr/local/build11/harvest-3.6.0/bin/harvest -c directories.xml [SUMMARY] Reading configuration from /data/home/pds4/test/directories.xml [SUMMARY] Output directory: /tmp/harvest/out [SUMMARY] Elasticsearch URL: http://localhost:9200, index: registry [INFO] Connecting to Elasticsearch [ERROR] Connection refused
@tloubrieu-jpl Wait, should I create the registry first?
[pds4@pdscloud-prod1 ~]$ /usr/local/build11/registry-manager-4.4.0/bin/registry-manager create-registry Elasticsearch URL: http://localhost:9200 Creating index... Index: registry Schema: /usr/local/build11/registry-manager-4.4.0/elastic/registry.json Shards: 1 Replicas: 0 [ERROR] Connection refused
localhost:9200 is some elastic search thing? How do I start that, presumably on pdscloud-prod1? Thanks
No, the registry is already created. Just need to load the data.
The Opensearch endpoint (i.e. the 'Elasticsearch URL') for the production EN registry is https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443
@jimmie @tloubrieu-jpl Ah. So in the harvest config file, I should change
<registry url="http://localhost:9200" index="registry" />
to
<registry url="https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443" index="registry" />
I hope that's right. Now I'm getting [pds4@pdscloud-prod1 test]$ /usr/local/build11/harvest-3.6.0/bin/harvest -c directories.xml [SUMMARY] Reading configuration from /data/home/pds4/test/directories.xml [SUMMARY] Output directory: /tmp/harvest/out [SUMMARY] Elasticsearch URL: https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443, index: registry [INFO] Connecting to Elasticsearch [ERROR] method [GET], host [https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443], URI [/registry/_mappings], status line [HTTP/1.1 403 Forbidden] {"Message":"User: anonymous is not authorized to perform: es:ESHttpGet because no resource-based policy allows the es:ESHttpGet action"}
So I need to set the user? I don't see an option on the harvest command line or in the config file
The Harvest documentation is here (at least for v1.0.3): https://nasa-pds.github.io/pds-registry-app/operate/harvest.html
In the config:
The last element is the auth file, which has the form of:
user=
Put in the credentials for your Opensearch login that I LFT'd to you (and you changed password). Enter the full path to this file in the Harvest config.
@rchenatjpl the link @jimmie sent is on the old documentation, the new one is here https://nasa-pds.github.io/registry/
For the authentication management in harvest job configuration file you can refer to https://nasa-pds.github.io/registry/user/harvest_job_configuration.html#registry-integration
@jimmie I redirected all the page of the obsolete repository to the new documentation, see https://nasa-pds.github.io/pds-registry-app/operate/harvest.html so that anyone who kept these pages in their bookmark is redirected to the new page. That was true, from the landing page of the documentation but not from every pages
@tloubrieu-jpl @jimmie [pds4@pdscloud-prod1 test]$ /usr/local/build11/harvest-3.6.0/bin/harvest -c directories.xml [SUMMARY] Reading configuration from /data/home/pds4/test/directories.xml [SUMMARY] Output directory: /tmp/harvest/out [SUMMARY] Elasticsearch URL: https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443, index: registry [INFO] Connecting to Elasticsearch [ERROR] method [GET], host [https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443], URI [/registry/_mappings], status line [HTTP/1.1 403 Forbidden] {"Message":"User: anonymous is not authorized to perform: es:ESHttpGet because no resource-based policy allows the es:ESHttpGet action"}
[pds4@pdscloud-prod1 test]$ grep auth directories.xml
[pds4@pdscloud-prod1 test]$ cat /home/pds4/test/auth.txt
trust.self-signed = true user = xxx password = xxx
We need to have pdscloud-prod1 and pdscloud-prod2 added to the OpenSearch whitelist. I will file a ticket to have that done.
oh, thank goodness, i assumed it was user error
@rchenatjpl, @tloubrieu-jpl, and @viviant100, it sounds like the EN-specific documentation wasn't looked at, which means that (1) the java version isn't set and (2) the installation isn't in the new deployment directory we had the SAs set up. Should I address these or work with @rchenatjpl on these? Also confirming that this should be on only one of the production machines and not both?
Did I miss steps? That's entirely possible. It's hard for me to chase down links and remember where I am.
@rchenatjpl I think you need to:
@c-suh how popular is 'jenv' ? Is it widely used in Joel's environments ? I never used it and was happy with setting JAVA_HOME. If we use it, which I am not against, we need to document that and make it a standard in our deployments because I would worry if sometimes we set the java_home manually, sometimes we use Jen. That can become messy.
Thanks
@tloubrieu-jpl pardon, "Joel's environments"? jenv
seems to be used fairly commonly to manage multiple java versions, and I think it's much friendlier than alternatives
or sdkman
, because you can set local environments once (like with pyenv
) and forget about it, opposed to having to manually switch between versions. However, I had done this (installed a version manager rather than switching entirely to version 11), because I assumed that upgrading the entire system to jdk11 would have broken some of our older tools. Apologies for not posing this as a question or making it more obvious at the end of my comment above. So, should we upgrade entirely to jdk11 or use a java version manager?
Additionally, I did not see anything in the public documentation about configuring any endpoints, but is there anything we need to let the SAs know to use the new versions of these tools? There was the related question in the Slack channel regarding the new deployment directory (and possible standardized procedure), because currently, any upgrades to these tools requires letting the SAs know to re-point to the new versions.
As for documentation, regarding setting the java version locally, I included this as step 2 of "Installation" for both Standalone Harvest and Registry Manager. Regarding the initial installation and setup of jenv
, I kept personal notes but did not post it in the internal wiki. Should I create a page for now under the Software page, and it can be moved later to a page or section that is not for PDS-specific software?
@rchenatjpl yup, I linked the documentation in Slack channel, but I see now that it's hardly visible amidst all the other blue of the mentions. Too many conveniences cluttered together. Here is the internal documentation for Standalone Harvest and Registry Manager. These can also be gotten to from the Registry section of Software Installation and Deployment Guides
@tloubrieu-jpl and @viviant100 as suggested then agreed upon, I have removed jenv
and instead inserted a line to call another script which has the jdk11 path. This has been tested on gamma and the documentation has been updated for when Richard installs this on production. I've noticed that the pds account on gamma has a pds-registry-app-1.0.2
which contains harvest-3.5.1
and registry-manager-4.3.0
. Should this be left alone or deleted?
I would suggest to delete it if it's not in use.
@rchenatjpl I saw your comment on the internal Registry Manager wiki page, and it sounds like it installed successfully! Please confirm if
@c-suh @tloubrieu-jpl @jimmie I installed registry-manager and harvest, but the next question of course is how to hook them up. Is there a way to check if registry-manager is already running? If I should run registry-manager on prod1 myself, is there a way to check if OpenSearch is running? So I tried:
[pds4@pdscloud-prod1 test]$ registry-manager create-registry
Elasticsearch URL: http://localhost:9200
Creating index...
Index: registry
Schema: /usr/local/applications/registry-manager/elastic/registry.json
Shards: 1
Replicas: 0
[ERROR] Connection refused
Hopefully that means registry-manager is running already on the machine I'm supposed to connect to. Is that localhost:9200 or the long amazonaws URL that Jimmie sent a while ago? I tried the latter. In ~pds4/test/directories.xml:
and that auth file does exist and does hold what Jimmie sent. I did not change the password - I just want stuff to work once before doing so. Anyway,
[pds4@pdscloud-prod1 test]$ harvest -c directories.xml
[SUMMARY] Reading configuration from /data/home/pds4/test/directories.xml
[SUMMARY] Output directory: /tmp/harvest/out
[SUMMARY] Elasticsearch URL: https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443, index: registry
[INFO] Connecting to Elasticsearch
[ERROR] method [GET], host [https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443], URI [/registry/_mappings], status line [HTTP/1.1 403 Forbidden]
{"Message":"User: anonymous is not authorized to perform: es:ESHttpGet because no resource-based policy allows the es:ESHttpGet action"}
And for good measure I tried localhost:9200
[pds4@pdscloud-prod1 test]$ diff directories.xml dir2.xml
22,23c22,23
< <!--registry url="http://localhost:9200" index="registry" /-->
< <registry url="https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443" index="registry" auth="/home/pds4/test/auth.txt" />
---
> <registry url="http://localhost:9200" index="registry" />
> <!--registry url="https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443" index="registry" auth="/home/pds4/test/auth.txt" /-->
[pds4@pdscloud-prod1 test]$ harvest -c dir2.xml
[SUMMARY] Reading configuration from /data/home/pds4/test/dir2.xml
[SUMMARY] Output directory: /tmp/harvest/out
[SUMMARY] Elasticsearch URL: http://localhost:9200, index: registry
[INFO] Connecting to Elasticsearch
[ERROR] Connection refused
I tried yet another version with the auth file attached to localhost, but same thing. I'll keep plugging away, but hopefully someone can spot the problem easily. Thanks
@rchenatjpl
Is there a way to check if registry-manager is already running?
As a general idea, registry-manager is just a command-line tool for interacting with OpenSearch (aka the Registry). It is a command-line tool, similar to validate or the legacy search-core. So it just runs when you tell it to run. That being said, if you want to check if you or someone else is running it right now:
ps aux | egrep registry-manager
Is that localhost:9200 or the long amazonaws URL that Jimmie sent a while ago?
I believe it is the long URL, but I will leave that to @jimmie
@rchenatjpl - the admins have enabled access to en-prod. I need to test. If that works, they will enable it for all of the other OpenSearch domains.
To answer your question regarding accessing Opensearch, you will need to use the URI: https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/
I have verified that I can access en-prod from pdscloud-prod1 and pdscloud-prod2. @rchenatjpl - you should now be unblocked in this respect.
Thanks, @jimmie OK, am I doing something wrong? I think I ingested, but I see nothing in OpenSearch. Here's the URL I used, and I did type in my username/password https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q=*&pretty and the web page says "message: null" I thought populating the registry had worked, but maybe not. Here's that output. Thanks,
% harvest -c directories.xml
[SUMMARY] Reading configuration from /data/home/pds4/test/directories.xml
[SUMMARY] Output directory: /tmp/harvest/out
[SUMMARY] Elasticsearch URL: https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443, index: registry
[INFO] Connecting to Elasticsearch
[INFO] Loading PDS to ES data type mapping from /usr/local/applications/harvest/elastic/data-dic-types.cfg
[INFO] Processing directory: /home/rchen/testdata
[INFO] Processing /home/rchen/testdata/mission.apollo_11_1.0.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd
[INFO] Downloading https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.JSON to /tmp/LDD-1122365850083766232.JSON
Jul 07, 2022 11:13:47 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
WARNING: Invalid cookie header: "Set-Cookie: AWSALB=jxy0hDiJzAnA6jOK6Ms79duG82lRLHHM1yJp9atrIYZGatIWZj79kEae5PgMPoZdJSMZtD0CCNljnniiCOKIyw29nsGdPPv8P/jt9MwoQN+TaNSacoLEVuEIGeTi; Expires=Thu, 14 Jul 2022 18:13:47 GMT; Path=/". Invalid 'expires' attribute: Thu, 14 Jul 2022 18:13:47 GMT
Jul 07, 2022 11:13:47 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
WARNING: Invalid cookie header: "Set-Cookie: AWSALBCORS=jxy0hDiJzAnA6jOK6Ms79duG82lRLHHM1yJp9atrIYZGatIWZj79kEae5PgMPoZdJSMZtD0CCNljnniiCOKIyw29nsGdPPv8P/jt9MwoQN+TaNSacoLEVuEIGeTi; Expires=Thu, 14 Jul 2022 18:13:47 GMT; Path=/; SameSite=None; Secure". Invalid 'expires' attribute: Thu, 14 Jul 2022 18:13:47 GMT
[INFO] Creating temporary ES data file /tmp/es-10993609303684127597.json
[INFO] Loading ES data file: /tmp/es-10993609303684127597.json
[INFO] Loaded 500 document(s)
[INFO] Loaded 1000 document(s)
[INFO] Loaded 1326 document(s)
[INFO] Updating Elasticsearch schema.
[INFO] Updated 8 fields
[INFO] Processing /home/rchen/testdata/orexsmall/naf018_sff/collection_radioscience_naf018_sff.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[INFO] Updated 4 fields
[INFO] Wrote 1 collection inventory document(s)
[INFO] Processing /home/rchen/testdata/orexsmall/naf018_sff/cruise/orx_r_160928_161002_v01.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'orex' LDD. Schema location: https://pds.nasa.gov/pds4/mission/orex/v1/orex_ldd_OREX_1400.xsd
[INFO] Downloading https://pds.nasa.gov/pds4/mission/orex/v1/orex_ldd_OREX_1400.JSON to /tmp/LDD-658827965114614068.JSON
Jul 07, 2022 11:14:02 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
WARNING: Invalid cookie header: "Set-Cookie: AWSALB=pfsB5Ur8aknV2/3HqQM8HNTdJq3I51F49smNs0IhitpX/n8NHnYBguI1eZumc2zOMWQypX3FWUjUfxzrORfzvShJklE7UpWg51+mlgVSv95P8SxEex8UuGbd+UOQ; Expires=Thu, 14 Jul 2022 18:14:02 GMT; Path=/". Invalid 'expires' attribute: Thu, 14 Jul 2022 18:14:02 GMT
Jul 07, 2022 11:14:02 AM org.apache.http.client.protocol.ResponseProcessCookies processCookies
WARNING: Invalid cookie header: "Set-Cookie: AWSALBCORS=pfsB5Ur8aknV2/3HqQM8HNTdJq3I51F49smNs0IhitpX/n8NHnYBguI1eZumc2zOMWQypX3FWUjUfxzrORfzvShJklE7UpWg51+mlgVSv95P8SxEex8UuGbd+UOQ; Expires=Thu, 14 Jul 2022 18:14:02 GMT; Path=/; SameSite=None; Secure". Invalid 'expires' attribute: Thu, 14 Jul 2022 18:14:02 GMT
[INFO] Creating temporary ES data file /tmp/es-9623659808871018744.json
[INFO] Loading ES data file: /tmp/es-9623659808871018744.json
[INFO] Loaded 285 document(s)
[INFO] Updating Elasticsearch schema.
[INFO] Updated 15 fields
[INFO] Processing /home/rchen/testdata/orexsmall/naf018_sff/cruise/orx_r_160919_160922_v01.xml
[INFO] Processing /home/rchen/testdata/orexsmall/naf018_sff/cruise/orx_r_160909_160913_v01.xml
[INFO] Processing /home/rchen/testdata/orexsmall/naf018_sff/cruise/orx_r_160929_161006_v01.xml
[INFO] Processing /home/rchen/testdata/orexsmall/naf018_sff/cruise/orx_r_160922_160929_v01.xml
[INFO] Processing /home/rchen/testdata/orexsmall/naf018_sff/cruise/orx_r_160915_160919_v01.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2021_114_2021_118_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2018_306_2018_334_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/collection_trk223_ion_vlbi.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[INFO] Updated 2 fields
[INFO] Wrote 1 collection inventory document(s)
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2019_060_2019_064_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2018_244_2018_273_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2019_259_2019_264_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2020_129_2020_131_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2018_336_2018_364_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2019_059_2019_059_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2020_030_2020_030_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2020_032_2020_055_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2018_274_2018_304_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2018_231_2018_242_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2020_259_2020_270_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2019_001_2019_002_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2019_159_2019_162_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2020_214_2020_214_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_vlbi/orex_beno_2021_121_2021_150_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_214_2020_245_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_060_2019_091_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_152_2019_182_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2021_032_2021_060_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_336_2021_001_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_244_2019_274_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2021_001_2021_032_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_305_2019_335_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_335_2020_001_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_306_2020_336_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_213_2019_244_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_274_2019_305_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2021_060_2021_091_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_032_2019_060_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_001_2019_032_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2018_274_2018_305_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_091_2019_121_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_182_2019_213_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_122_2020_153_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_092_2020_122_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2021_121_2021_152_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_061_2020_092_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2018_335_2019_001_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2018_213_2018_244_ion.xml
[INFO] Wrote 50 product(s)
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/collection_trk223_ion_dopr.xml
[INFO] Wrote 1 collection inventory document(s)
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2018_305_2018_335_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_183_2020_214_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_032_2020_061_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_245_2020_275_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2021_091_2021_121_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_275_2020_306_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2019_121_2019_152_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_001_2020_032_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2018_244_2018_274_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk223_ion_dopr/orex_beno_2020_153_2020_183_ion.xml
[INFO] Processing /home/rchen/testdata/orexsmall/document/SIS_NAF018_ORX-SFF_CCv0001.xml
[ERROR] Data file /home/rchen/testdata/orexsmall/document/SIS_NAF018_ORX-SFF_CCv0001.pdf doesn't exist
[INFO] Processing /home/rchen/testdata/orexsmall/document/collection_radioscience_document.xml
[INFO] Wrote 2 collection inventory document(s)
[INFO] Processing /home/rchen/testdata/orexsmall/document/spacecraft_mass_history.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[INFO] Updated 8 fields
[INFO] Processing /home/rchen/testdata/orexsmall/document/antenna_swap_history.xml
[INFO] Processing /home/rchen/testdata/orexsmall/document/radioscience_bundle_information.xml
[ERROR] Data file /home/rchen/testdata/orexsmall/document/radioscience_bundle_information.pdf doesn't exist
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/collection_radioscience_trk234_traknav.xml
[INFO] Wrote 1 collection inventory document(s)
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_259_134529_2016_259_221501_25.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'orex' LDD. Schema location: https://pds.nasa.gov/pds4/mission/orex/v1/orex_ldd_OREX_1400.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[INFO] Updated 17 fields
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_258_061550_2016_258_151500_65.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_255_224030_2016_256_075000_35.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_256_140107_2016_256_220001_25.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_254_205513_2016_255_080000_35.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_257_134558_2016_257_215501_25.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_258_135552_2016_258_213000_26.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_259_062111_2016_259_151000_55.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_255_141049_2016_256_001000_26.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_256_204012_2016_257_075500_35.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_257_063552_2016_257_150500_65.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_258_201047_2016_259_074000_45.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_255_063030_2016_255_154000_54.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_257_203533_2016_258_072500_35.xml
[INFO] Processing /home/rchen/testdata/orexsmall/trk234_trknav/cruise/orex_beno_2016_256_063028_2016_256_152452_54.xml
[INFO] Processing /home/rchen/testdata/orexsmall/bundle_orex_radioscience.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[INFO] Updated 2 fields
[INFO] Wrote 81 product(s)
[SUMMARY] Summary:
[SUMMARY] Skipped files: 0
[SUMMARY] Loaded files: 81
[SUMMARY] Product_Bundle: 1
[SUMMARY] Product_Collection: 5
[SUMMARY] Product_Context: 1
[SUMMARY] Product_Document: 2
[SUMMARY] Product_Observational: 72
[SUMMARY] Failed files: 2
[SUMMARY] Package ID: 5586893d-14aa-45c2-982e-ec257bff89ee
%
%
%
% cat directories.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--
* !!! 'nodeName' is a required attribute. !!!
* Use one of the following values:
* PDS_ATM - Planetary Data System: Atmospheres Node
* PDS_ENG - Planetary Data System: Engineering Node
* PDS_GEO - Planetary Data System: Geosciences Node
* PDS_IMG - Planetary Data System: Imaging Node
* PDS_NAIF - Planetary Data System: NAIF Node
* PDS_PPI - Planetary Data System: Planetary Plasma Interactions Node
* PDS_RMS - Planetary Data System: Rings Node
* PDS_SBN - Planetary Data System: Small Bodies Node at University of Maryland
* PSA - Planetary Science Archive
* JAXA - Japan Aerospace Exploration Agency
-->
<harvest nodeName="PDS_ENG">
<!-- Registry configuration -->
<!-- UPDATE with your registry information -->
<!--registry url="http://localhost:9200" index="registry" auth="/home/pds4/test/auth.txt" /-->
<!--registry url="http://localhost:9200" index="registry" /-->
<registry url="https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443" index="registry" auth="/home/pds4/test/auth.txt" />
<directories>
<!-- Path to one or more directories with PDS4 labels -->
<path>/home/rchen/testdata</path>
</directories>
<!--
NOTE: By default only lid, vid, lidvid, title and product class are exported.
autogenFields should also be enabled for operational ingestion.
See documentation for more configuration options: https://nasa-pds.github.io/pds-registry-app/operate/harvest.html
-->
<fileInfo processDataFiles="true" storeLabels="true">
<!-- UPDATE with your own local path and base url where pds4 archive are published -->
<fileRef replacePrefix="/path/to/archive" with="https://url/to/archive/" />
</fileInfo>
<!--
Extract all fields. Field names: <namespace>:<class_name>/<namespace>:<attribute_name>
NOTE: This should only be disabled for testing purposes
-->
<autogenFields/>
</harvest>
@jimmie @tloubrieu-jpl @c-suh Regarding the last message, I'm hoping a 2-minute glance will show something obvious to one of you. If not, I'll muck around. Thanks
@rchenatjpl - how are you querying Opensearch? I'm seeing documents in there.
curl -u pdsadmin -X GET "https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_count"
{"count":106,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
I'm using a browser with URL https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q=*&pretty
Hmm, I guess I have the tail end of that URL wrong. I'll search. Thanks
No, that works for me. What login are you using?
No login. Oh, I replaced the escaped & with &, and now I see "hits". Thanks, @jimmie
@tloubrieu-jpl I'm getting back into this, and the configuration confuses me again. From pdscloud-prod1, harvest's config file points to registry url="https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443" index="registry" auth="/home/pds4/test/auth.txt" When I re-harvest something I harvested a month ago, I correctly get many error messages 'Skipping registered product...', e.g. [WARN] Skipping registered product urn:nasa:pds:orex.radioscience::1.0
Now, how do I manipulate the registry? And where? I tried
% curl --get 'https://pds.nasa.gov/api/search/1.0/products/urn:nasa:pds:orex.radioscience::1.0' --header 'Accept: application/json'
{"request":"/products/urn:nasa:pds:orex.radioscience::1.0","message":"The lidvid urn:nasa:pds:orex.radioscience was not found"}
So I replaced https://pds.nasa.gov with the URL in the harvest config file
% curl --get 'https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443/api/search/1.0/products/urn:nasa:pds:orex.radioscience::1.0' --header 'Accept: application/json'
{"error":"no handler found for uri [/api/search/1.0/products/urn:nasa:pds:orex.radioscience::1.0] and method [GET]"}
I'm going to want to view and delete entries. I should be using these curl commands? Should I also be able to do so using registry-manager commands? If so, how do I point registry-manager to the OpenSearch that harvest uses? Thanks
redirect above inquiry to @jimmie
You were correct in sending the api/search/1.0 to pds.nasa.gov - Opensearch itself (search-en-prod*) doesn't understand that format. I imagine the LIDVID is not found because the archive_status has not been promoted to 'archived' which you need to do using registry-manager.
Registry-manager should use the same endpoint that you used in Harvest and is specified on the command line using the -es switch. Run registry-manager -help
to see the available command line options.
@jimmie Is https://pds.nasa.gov/api/search/ accessing other registries as well as the one I'm mucking with? I ingested LIDs urn:nasa:pds:asteroid_polarimetric_database*, and before promoting them to 'archived', they already showed up in the URL above. And if I delete all entries from my registry, those LIDs still show, even in a private browser.
@rchenatjpl - yes, that endpoint accesses all of the registries via Opensearch cross-cluster search (CCS). Unfortunately, with this architecture there is no way to isolate EN since EN's Opensearch is the one with CCS enabled.
No problem, thanks, I'm just trying to account for what I'm seeing. @jordanpadams I'm going to create 1 more fake bundle with slightly weird stuff, then I'll call my stuff done unless you want more. ETA tonight. I would use real bundles, but the ones that are workable are already registered.
@rchenatjpl that sounds great! once you get that test data, can you hand it over to @c-suh so she can upload it here: https://pds.nasa.gov/data/pds4/test-data/custom-datasets/
@c-suh Sorry to stick you with this. pdscloud-prod1://home/pds4/contextSubset.tgz has 2 similar toy bundles, both with PDS4 context products, a bundle, collections, and documents, as requested at the top of this issue. The tester should harvest the first bundle, check that OpenSearch does not have the LIDs from the second bundle not in the first bundle, harvest the second bundle, see those LIDs.
I just noticed the PDS3 request. I don't know the innards of how a PDS3 catalog file gets into the current database. I run 'catalog -mingest' to do that. Those calls operationally are buried within ~pds4/catalog/*/update-procedure.txt, which are scripts despite the file extension.
I did successfully ingest /data/pds4/context-pds3, the context products equivalent to the PDS3 catalog files, but that may not be what you want.
@rchenatjpl, @jordanpadams, and @viviant100 sorry in turn, but it's unclear to me what I'm supposed to do.
@rchenatjpl @c-suh sorry for the confusion here. I lost track of this ticket and it seems like we are headed down a rabbit hole we really didn't want to go.
the premise of this ticket and it's parent was to deploy all the tools and actually ingest all of the EN-managed data into the registry. not to test the registry beyond a very simple initial smoke test of "can I communicate with and ingest something". From the comments above, it looks like the smoke test is completed, so the final steps should be for us to get all this data ingested and update our procedures in the future to include this ingestion.
@rchenatjpl can we please:
Ignore this now, as Jordan and I were typing simultaneously
@c-suh oh my god, I named it something else, which I renamed it to contextSubset.tgz. Sorry about that. I don't know what's planned for that. Is there value in doing it again now?
@rchenatjpl sorry. just realized you said this:
successfully ingest /data/pds4/context-pds3
great! that is good enough for now. we definitely need to figure out where all that catalog ingestion data goes. especially for updates to the data sets / PDS3 context products. i will create another ticket to investigate there.
also, per @jimmie's comment above for searching the registry for EN data, we should be able to query via our node_id, but I have had some trouble figuring out how to search that field (or any field names really) via a URL endpoint. will get back to you.
@jimmie any ideas on how we could query the registry by node id?
node is given by the term ops:Harvest_Info/ops:node_name. I thought the API provides the ability to add a 'query' as a URI query parameter that could include a specific value for node_name
@jimmie copy. the PDS API definitely supports this, I just haven't quite been able to figure out the appropriate syntax/escaping in order to make the query work via curl. I will keep poking.
@jordanpadams @viviant100 Regarding "PDS3 Context Products" in the Description at the very top of this ticket, do you want the PDS3 context products ingested? Currently, they're hidden outside EN presumably because we don't want anyone to reference any of them. These are the ones that begin urn:nasa:pds:context_pds3:..., and they're sitting in /data/pds4/1700/PDS3_context_bundle_20161220/. Or do you have something else in mind for "PDS3 Context Products"? I'll assume I shouldn't ingest them, but if you want me to, I'll do it, though I'll have to create a bundle.xml and collection*.xml
💡 Description
List of products to ingest: