Open jeanetteclark opened 1 month ago
Update:
The rsync
+ parallel
process to copy the contents of /var/metacat/hashstore
to /mnt/tdg-repos/dev/metacat/hashstore
has been completed.
/var/metacat/hashstore
folder.
Next Steps:
To Do List:
metacat/hashstore
to /mnt/tdg-repos/dev
via parallel Rsyncmetacat.properties
store.store_path
field to be /mnt/tdg-repos/dev/metacat/hashstore
metadig-engine
to the test clusterdev.nceas
(via metacatUI or any other client)For reference:
# How to produce a text file with just the first level of hashstore folders to rsync
mok@dev:~/testing$ sudo find /var/metacat/hashstore -mindepth 1 -maxdepth 1 > mc_hs_dir_list.txt
mok@dev:~/testing$ cat mc_hs_dir_list.txt
/var/metacat/hashstore/objects
/var/metacat/hashstore/metadata
/var/metacat/hashstore/refs
/var/metacat/hashstore/hashstore.yaml
# How to use rsync with a list of folders
mok@dev:~/testing$ cat mc_hs_dir_list.txt | parallel --eta sudo rsync -aHAX {} /mnt/tdg-repos/dev/metacat/hashstore/
# First get the list of files found under `/hashstore`
mok@dev:~/testing$ sudo find /var/metacat/hashstore -type f -printf '%P\n' > mc_obj_list.txt
# How to feed a single command at a time for a file to rsync
# The /./ between `metacat` and `hashstore` instructs rsync to copie folders from hashstore (and omits the previous directories) into the desired folder
mok@dev:~/testing$ parallel --eta sudo rsync -aHAXR /var/metacat/./hashstore/{} /mnt/tdg-repos/dev/metacat :::: mc_obj_list.txt
-j 30
it was limited to 30.)Metacat on dev.nceas.ucsb.edu
has been moved over to write to the ceph fs mount point - a symlink has been created between /var/metacat/hashstore
and /mnt/tdg-repos/dev/metacat/hashstore
.
read-only file system issue
that was caused due to how tomcat
set-up its access control rules (the actual path to write above needed to be added to its configuration settings).rsync
was re-ran and the process to sync with a list of direct subfolders after /var/metacat/hashstore
was the fastest. I tested with feeding rsync individual commands (ex. via :::: list_of_files.txt
) but this seemed to be very slow. The re-sync process took approximately 5 minutes.
Current Status:
It appears the 'Assessment Reports' (Metadig) for datasets at dev.nceas.ucsb.edu
is not working as expected:
There was an error generating the assessment report.
The Assessment Server reported this error:
Unable to run quality suite for pid urn:uuid:313d899d-dc77-435d-9638-abd09faf7143, suite FAIR-suite-0.4.0
Failed : HTTP error code : 403 Return to the dataset
Next Steps:
1) Restoring expected Metadig functionality @ dev.nceas.ucsb.edu
metadig-controller
, metadig-scorer
and metadig-scheduler
are all on image v3.0.2
- except for metadig-worker
which is using the feature-hashstore-support
image. Before attempting to deploy the feature-hashstore-support
image to the scorer, scheduler and controller per Jeanette's instructions, I will restore metadig-worker
to using image v3.0.2
to try and resolve the issue on the test site.2) Obtaining the last missing feature-hashstore-support
image for metadig-controller
metadig-controller
also does not have a feature-hashstore-support
image. This will require the execution of mvn publish
while on the correct branch for the metadig-engine
. I likely do not have appropriate permissions and will seek assistance from Jing to move forward here.3) Deploying feature-hashstore-support
for Metadig in full on the dev cluster
feature-hashstore-support
as per such after updating the image.tag
in the respective values.yaml
files (four total for each Metadig-engine piece) helm upgrade metadig-scheduler ./metadig-scheduler --namespace metadig --set image.pullPolicy=Always --recreate-pods=true --set k8s.cluster=dev
helm upgrade metadig-scorer ./metadig-scorer --namespace metadig --set image.pullPolicy=Always --recreate-pods=true --set k8s.cluster=dev
helm upgrade metadig-worker ./metadig-worker --namespace metadig --set image.pullPolicy=Always --set replicaCount=1 --recreate-pods=true --set k8s.cluster=dev
helm upgrade metadig-controller ./metadig-controller --namespace metadig --set image.pullPolicy=Always --recreate-pods=true --set k8s.cluster=dev
To Do List & Follow-up Questions
metacat/hashstore
to /mnt/tdg-repos/dev
via parallel Rsyncmetacat.properties
store.store_path
field to be /mnt/tdg-repos/dev/metacat/hashstore
dev.nceas.ucsb.edu
metadig-engine
to the test cluster
dev.nceas
(via metacatUI or any other client)
Update:
metadig-controller
back to the v.3.0.2
imagemetadig-scheduler
, metadig-scorer
and metadig-worker
with the feature-hashstore-support
images - however the Assessment Reports did not work.feature-hashstore-support
, and the logs from the metadig-controller
& metadig-worker
- it looks like the engine is unable to communicate with solr
to get the list of data pids. Currently debugging.
20241022-16:43:28: [ERROR]: Unable to run quality suite. [edu.ucsb.nceas.mdqengine.Worker:224]
edu.ucsb.nceas.mdqengine.exception.MetadigException: Unable to run quality suite for pid urn:uuid:761e9125-f775-4bf8-9a80-8cc970a52353, suite FAIR-suite-0.4.0Failed : HTTP error code : 403
at edu.ucsb.nceas.mdqengine.Worker.processReport(Worker.java:568)
at edu.ucsb.nceas.mdqengine.Worker$1.handleDelivery(Worker.java:212)
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:149)
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:111)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Failed : HTTP error code : 403
at edu.ucsb.nceas.mdqengine.MDQEngine.findDataPids(MDQEngine.java:271)
at edu.ucsb.nceas.mdqengine.MDQEngine.runSuite(MDQEngine.java:120)
at edu.ucsb.nceas.mdqengine.Worker.processReport(Worker.java:564)
... 6 more
feature-hashstore-support
code changes do not involve the metadig-controller
, so I've paused here for now.To Do List & Follow-up Questions
metacat/hashstore
to /mnt/tdg-repos/dev
via parallel Rsyncmetacat.properties
store.store_path
field to be /mnt/tdg-repos/dev/metacat/hashstore
dev.nceas.ucsb.edu
feature-hashstore-support
image to metadig-worker
, metadig-scheduler
and metadig-scorer
pods in the dev test clusterdev.nceas
(via metacatUI or any other client)
Update:
The Metadig Assessment Reports are still unable to generate.
The URL that appears to be causing the issue should be and is publicly accessible:
https://dev.nceas.ucsb.edu/knb/d1/mn/v2/query/solr/q=isDocumentedBy:%22urn:uuid:c559c233-8bf9-42b4-98df-8558f4a4776a%22
Dataset: https://dev.nceas.ucsb.edu/view/urn%3Auuid%3Ac559c233-8bf9-42b4-98df-8558f4a4776a
try {
String nodeEndpoint = D1Client.getMN(nodeId).getNodeBaseServiceUrl();
String encodedId = URLEncoder.encode(identifier, "UTF-8");
String queryUrl = nodeEndpoint + "/query/solr/?q=isDocumentedBy:" + "\"" + encodedId + "\"" + "&fl=id";
URL url = new URL(queryUrl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
if (dataOneAuthToken != null) {
connection.setRequestProperty("Authorization", "Bearer " + dataOneAuthToken);
}
if (connection.getResponseCode() != 200) {
// Line 271
throw new RuntimeException("Failed : HTTP error code : " + connection.getResponseCode());
}
...
Adjusting the metadig-worker
deployment's environment variable to make use of the dataone-secret
does not appear to have any effect (below for quick reference).
findDataPids
method, it appears that we want to include a token to make a request to get the data objects (in case we are searching for private datasets?). If it cannot find an environment variable, it will default to the config - which states that the token is not set in the config.
# /metadig-worker/templates/deployment.yaml
... env:
Update:
Even after fixing the connection URL (below), I am still experiencing a http 403 forbidden error.
String encodedId = URLEncoder.encode(identifier, "UTF-8");
// This is necessary for metacat's solr to process the requested queryUrl
String encodedQuotes = URLEncoder.encode("\"", "UTF-8");
String queryUrl = nodeEndpoint + "/query/solr/?q=isDocumentedBy:" + encodedQuotes + encodedId + encodedQuotes + "&fl=id";
The logging message which shows the end point is accessible via both the browser, and within the metadig-worker
pod itself. Metacat's solr
index does not have specific access control rules so this get request from the metadig-worker
should be able to be processed.
doumok@Dou-NCEAS-MBP14.local:~/Code/testing/metadig $ kubectl exec -it metadig-worker-75c5689d69-4tt4v /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# curl "https://dev.nceas.ucsb.edu/knb/d1/mn/v2/query/solr/?q=isDocumentedBy:%22urn%3Auuid%3Aae970e0a-3a26-4af7-8a84-235c9a8e3a5d%22&fl=id"
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">19</int>
<lst name="params">
<str name="q">isDocumentedBy:"urn:uuid:ae970e0a-3a26-4af7-8a84-235c9a8e3a5d"</str>
<str name="fl">id</str>
<str name="fq">(readPermission:"public")OR(writePermission:"public")OR(changePermission:"public")OR(isPublic:true)</str>
<str name="wt">javabin</str>
<str name="version">2</str>
</lst>
</lst>
<result name="response" numFound="5" start="0" numFoundExact="true">
<doc>
<str name="id">urn:uuid:9ebcadac-b015-48fb-a2c5-1ff7db692f19</str></doc>
<doc>
<str name="id">urn:uuid:75db2307-4b78-4a8b-bc59-5b2ce318519f</str></doc>
<doc>
<str name="id">urn:uuid:ae970e0a-3a26-4af7-8a84-235c9a8e3a5d</str></doc>
<doc>
<str name="id">urn:uuid:52106ea7-f24b-4247-a697-272023fb158e</str></doc>
<doc>
<str name="id">urn:uuid:b3dd42d8-7489-4d95-bcba-81940bdefbe2</str></doc>
</result>
</response>
The DATAONE_AUTH_TOKEN
does not seem to make any difference (confirmed that it's been set in the environment variable both in the logs, and with the command kubectl exec -t metadig-worker-75c5689d69-4tt4v -- env
)
# Error log
20241025-21:43:14: [DEBUG]: Running suite: FAIR-suite-0.4.0 [edu.ucsb.nceas.mdqengine.MDQEngine:97]
20241025-21:43:14: [DEBUG]: Got token from env. [edu.ucsb.nceas.mdqengine.MDQEngine:241]
20241025-21:43:16: [DEBUG]: queryURL: https://dev.nceas.ucsb.edu/knb/d1/mn/v2/query/solr/?q=isDocumentedBy:%22urn%3Auuid%3Aae970e0a-3a26-4af7-8a84-235c9a8e3a5d%22&fl=id [edu.ucsb.nceas.mdqengine.MDQEngine:264]
20241025-21:43:16: [ERROR]: Unable to run quality suite. [edu.ucsb.nceas.mdqengine.Worker:224]
edu.ucsb.nceas.mdqengine.exception.MetadigException: Unable to run quality suite for pid urn:uuid:ae970e0a-3a26-4af7-8a84-235c9a8e3a5d, suite FAIR-suite-0.4.0Failed : HTTP error code : 403
at edu.ucsb.nceas.mdqengine.Worker.processReport(Worker.java:568)
at edu.ucsb.nceas.mdqengine.Worker$1.handleDelivery(Worker.java:212)
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:149)
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:111)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Failed : HTTP error code : 403
at edu.ucsb.nceas.mdqengine.MDQEngine.findDataPids(MDQEngine.java:275)
at edu.ucsb.nceas.mdqengine.MDQEngine.runSuite(MDQEngine.java:120)
at edu.ucsb.nceas.mdqengine.Worker.processReport(Worker.java:564)
... 6 more
20241025-21:43:16: [DEBUG]: Saving quality run status after error [edu.ucsb.nceas.mdqengine.Worker:240]
20241025-21:43:16: [DEBUG]: Saving to persistent storage: metadata PID: urn:uuid:ae970e0a-3a26-4af7-8a84-235c9a8e3a5d, suite id: FAIR-suite-0.4.0 [edu.ucsb.nceas.mdqengine.model.Run:272]
20241025-21:43:16: [DEBUG]: Done saving to persistent storage: metadata PID: urn:uuid:ae970e0a-3a26-4af7-8a84-235c9a8e3a5d, suite id: FAIR-suite-0.4.0 [edu.ucsb.nceas.mdqengine.model.Run:277]
20241025-21:43:16: [DEBUG]: Saved quality run status after error [edu.ucsb.nceas.mdqengine.Worker:249]
20241025-21:43:16: [DEBUG]: Sending report info back to controller... [edu.ucsb.nceas.mdqengine.Worker:390]
20241025-21:43:16: [INFO]: Elapsed time processing (seconds): 0 for metadataPid: urn:uuid:ae970e0a-3a26-4af7-8a84-235c9a8e3a5d, suiteId: FAIR-suite-0.4.0
[edu.ucsb.nceas.mdqengine.Worker:422]
I have a feeling that this is related to how k8s allows external REST API calls to be made (or not). The specific JAVA code to make the get request appears to be fine (since it can communicate and receive a 403 error). Investigation continues.
I have a feeling that this is related to how k8s allows external REST API calls to be made (or not).
k8s does not restrict pods from originating web connections to external hosts in any way unless it is configured to do so. MetaDIG is not configured to restrict anything afaik. You and I should touch base on this because I think you are following a red herring and the problem originates elsewhere. Your curl command from the pod shows that the connection is not blocked. So its something else about how you deployed. Let's chat.
@mbjones I think so too - I can't find anything related to that. I just pushed a commit to test whether the request is getting rejected because it's missing a User-Agent
property. I'll send you a PM via Slack and/or send you a calendar invite.
Deployment code for quick reference (taken from hand-off notes):
helm upgrade metadig-worker ./metadig-worker --namespace metadig --set image.pullPolicy=Always --set replicaCount=1 --recreate-pods=true --set k8s.cluster=dev
With the following changes in the respective metadig-worker
deployment files:
values.yaml
:
image:
repository: ghcr.io/nceas/metadig-worker
pullPolicy: Always
tag: "feature-hashstore-support"
deployment.yaml
under env
- name: DATAONE_AUTH_TOKEN
valueFrom:
secretKeyRef:
name: dataone-token
key: DataONEauthToken
@mbjones The Assessment Report generated after adding the User-Agent
property to the Java Code!
To Do List & Follow-up Questions
metacat/hashstore
to /mnt/tdg-repos/dev
via parallel Rsyncmetacat.properties
store.store_path
field to be /mnt/tdg-repos/dev/metacat/hashstore
dev.nceas.ucsb.edu
feature-hashstore-support
image to metadig-worker
, metadig-scheduler
and metadig-scorer
pods in the dev test clusterUser-Agent
as part of the get request (can I simply use something like java/17.0.1-temurin
as the value - will it be accepted by solr
?)dev.nceas
(via metacatUI or any other client)
Update:
User-Agent
, it appears that the RabbitMQ queue is no longer being populated with any new datasets being added to dev.nceas.ucsb.edumetadig-postgres
's last_harvest_datetime
for the urn:node:mnTestKNB
nodes to be a date in the past (ex. 2024-10-24T00:00:00.000Z
) per the operations manual, which shows an uptick in RabbitMQ, but this only affects datasets up to a specific date (not the new ones I added).metadig-postgres
runs table. Currently investigating where the breakdown in communication is occurring.@doulikecookiedough regarding your question on how to directly communicate with metadig, that would be via the API. Most operations require authentication, but you can, for example, access completed run reports with a request like:
https://api.test.dataone.org/quality/runs/FAIR-suite-0.4.0/urn:uuid:0b44a2d5-dcd5-4798-8072-4030b14e8936
This one doesn't work, as it appears the FAIR-suite-0.4.0
was not run for the PID listed. You can get an overview of the whole API at https://api.test.dataone.org/quality/ -- but note that only a portion of the planned methods were implemented - others are still TBD, and some were disabled for security reasons. A useful one is getting the list of current suites, which is at https://api.test.dataone.org/quality/suites/
.
If the API doesn't provide what you need, you can query the database itself via psql
.
Thank you for the clarification/direction @mbjones. Currently it looks like there's an issue with the scheduler - after restarting the pods (making sure the chart and app versions were both updated), some NullPointerExceptions
are being thrown. This may explain why the FAIR-suite-0.4.0
check isn't being run for the new PIDs that are being added in the urn:node:mnTestKNB
nodes.
20241028-18:02:10: [ERROR]: quality-test-dataone-fair: error creating rest client: Cannot assign field "after" because "link.before" is null [edu.ucsb.nceas.mdqengine.scheduler.RequestReportJob:190]
20241028-18:02:10: [INFO]: Job metadig.quality-test-dataone-fair threw a JobExecutionException: [org.quartz.core.JobRunShell:218]
org.quartz.JobExecutionException: java.lang.NullPointerException: Cannot assign field "after" because "link.before" is null [See nested exception: java.lang.NullPointerException: Cannot assign field "after" because "link.before" is null]
at edu.ucsb.nceas.mdqengine.scheduler.RequestReportJob.execute(RequestReportJob.java:191)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.lang.NullPointerException: Cannot assign field "after" because "link.before" is null
at org.apache.commons.collections.map.AbstractLinkedMap.removeEntry(AbstractLinkedMap.java:293)
at org.apache.commons.collections.map.AbstractHashedMap.removeMapping(AbstractHashedMap.java:543)
at org.apache.commons.collections.map.AbstractHashedMap.remove(AbstractHashedMap.java:325)
at org.apache.commons.configuration.BaseConfiguration.clearPropertyDirect(BaseConfiguration.java:133)
at org.apache.commons.configuration.AbstractConfiguration.clearProperty(AbstractConfiguration.java:503)
at org.apache.commons.configuration.CompositeConfiguration.clearPropertyDirect(CompositeConfiguration.java:269)
at org.apache.commons.configuration.AbstractConfiguration.clearProperty(AbstractConfiguration.java:503)
at org.apache.commons.configuration.AbstractConfiguration.setProperty(AbstractConfiguration.java:483)
at org.dataone.client.rest.HttpMultipartRestClient.setDefaultTimeout(HttpMultipartRestClient.java:588)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:222)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:199)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:184)
at edu.ucsb.nceas.mdqengine.scheduler.RequestReportJob.execute(RequestReportJob.java:188)
... 2 more
20241028-18:30:00: [ERROR]: Job metadig.downloads threw an unhandled Exception: [org.quartz.core.JobRunShell:222]
java.lang.NullPointerException
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileReader.<init>(Unknown Source)
at edu.ucsb.nceas.mdqengine.scheduler.AcquireWebResourcesJob.execute(AcquireWebResourcesJob.java:97)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
20241028-18:30:00: [ERROR]: Job (metadig.downloads threw an exception. [org.quartz.core.ErrorLogger:2360]
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.NullPointerException]
at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.lang.NullPointerException
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileReader.<init>(Unknown Source)
at edu.ucsb.nceas.mdqengine.scheduler.AcquireWebResourcesJob.execute(AcquireWebResourcesJob.java:97)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
... 1 more
Check-in:
The newest datasets did not have their Assessment Reports generated because cn-stage
was not harvesting from the urn:node:mnTestKNB
node. So when I set back the last_harvest_datetime
in metadig-postgres
, it was unable to catch the latest datasets.
https://cn-stage.test.dataone.org/cn/v2/node
/etc/init.d/d1-index-task-processor start
/etc/init.d/d1-index-task-generator start
/etc/init.d/d1-processing
AcquireWebResourcesJob Exception
downloadsList
to metadig.properties
/opt/local/metadig/data/
is not saving even though it appears as part of the process.
https://cn.dataone.org/cn/v2/formats ~> /opt/local/metadig/data/all-dataone-formats.xml
20241027-23:30:00: [ERROR]: Job metadig.downloads threw an unhandled Exception: [org.quartz.core.JobRunShell:222]
java.lang.NullPointerException
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileReader.<init>(Unknown Source)
at edu.ucsb.nceas.mdqengine.scheduler.AcquireWebResourcesJob.execute(AcquireWebResourcesJob.java:97)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
20241027-23:30:00: [ERROR]: Job (metadig.downloads threw an exception. [org.quartz.core.ErrorLogger:2360]
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.NullPointerException]
at org.quartz.core.JobRunShell.run(JobRunShell.java:224)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.lang.NullPointerException
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileInputStream.<init>(Unknown Source)
at java.base/java.io.FileReader.<init>(Unknown Source)
at edu.ucsb.nceas.mdqengine.scheduler.AcquireWebResourcesJob.execute(AcquireWebResourcesJob.java:97)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
... 1 more
RequestScorerJob Class (...and RequestReportJob)
RequestReportJob
is no longer experiencing the NullPointerException
relating to "before" and "link.after" - but now RequestScorerJob is. This bug likely affects both classes and needs to be investigated.
20241029-21:16:10: [ERROR]: Error creating rest client: Cannot assign field "before" because "link.after" is null [edu.ucsb.nceas.mdqengine.DataONE:74]
20241029-21:16:10: [ERROR]: portal-test-arctic-FAIR: unable to create connection to service URL https://test.arcticdata.io/metacat/d1/mn [edu.ucsb.nceas.mdqengine.scheduler.RequestScorerJob:187]
edu.ucsb.nceas.mdqengine.exception.MetadigProcessException: Unable to get collection pids
at edu.ucsb.nceas.mdqengine.DataONE.getMultipartD1Node(DataONE.java:75)
at edu.ucsb.nceas.mdqengine.scheduler.RequestScorerJob.execute(RequestScorerJob.java:185)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: java.lang.NullPointerException: Cannot assign field "before" because "link.after" is null
at org.apache.commons.collections.map.AbstractLinkedMap.removeEntry(AbstractLinkedMap.java:294)
at org.apache.commons.collections.map.AbstractHashedMap.removeMapping(AbstractHashedMap.java:543)
at org.apache.commons.collections.map.AbstractHashedMap.remove(AbstractHashedMap.java:325)
at org.apache.commons.configuration.BaseConfiguration.clearPropertyDirect(BaseConfiguration.java:133)
at org.apache.commons.configuration.AbstractConfiguration.clearProperty(AbstractConfiguration.java:503)
at org.apache.commons.configuration.CompositeConfiguration.clearPropertyDirect(CompositeConfiguration.java:269)
at org.apache.commons.configuration.AbstractConfiguration.clearProperty(AbstractConfiguration.java:503)
at org.apache.commons.configuration.AbstractConfiguration.setProperty(AbstractConfiguration.java:483)
at org.dataone.client.rest.HttpMultipartRestClient.setDefaultTimeout(HttpMultipartRestClient.java:588)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:222)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:199)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:184)
at edu.ucsb.nceas.mdqengine.DataONE.getMultipartD1Node(DataONE.java:72)
... 3 more
20241029-21:16:10: [INFO]: Job metadig.portal-test-arctic-FAIR threw a JobExecutionException: [org.quartz.core.JobRunShell:218]
org.quartz.JobExecutionException: portal-test-arctic-FAIR: unable to create connection to service URL https://test.arcticdata.io/metacat/d1/mn [See nested exception: edu.ucsb.nceas.mdqengine.exception.MetadigProcessException: Unable to get collection pids]
at edu.ucsb.nceas.mdqengine.scheduler.RequestScorerJob.execute(RequestScorerJob.java:188)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Caused by: edu.ucsb.nceas.mdqengine.exception.MetadigProcessException: Unable to get collection pids
at edu.ucsb.nceas.mdqengine.DataONE.getMultipartD1Node(DataONE.java:75)
at edu.ucsb.nceas.mdqengine.scheduler.RequestScorerJob.execute(RequestScorerJob.java:185)
... 2 more
Caused by: java.lang.NullPointerException: Cannot assign field "before" because "link.after" is null
at org.apache.commons.collections.map.AbstractLinkedMap.removeEntry(AbstractLinkedMap.java:294)
at org.apache.commons.collections.map.AbstractHashedMap.removeMapping(AbstractHashedMap.java:543)
at org.apache.commons.collections.map.AbstractHashedMap.remove(AbstractHashedMap.java:325)
at org.apache.commons.configuration.BaseConfiguration.clearPropertyDirect(BaseConfiguration.java:133)
at org.apache.commons.configuration.AbstractConfiguration.clearProperty(AbstractConfiguration.java:503)
at org.apache.commons.configuration.CompositeConfiguration.clearPropertyDirect(CompositeConfiguration.java:269)
at org.apache.commons.configuration.AbstractConfiguration.clearProperty(AbstractConfiguration.java:503)
at org.apache.commons.configuration.AbstractConfiguration.setProperty(AbstractConfiguration.java:483)
at org.dataone.client.rest.HttpMultipartRestClient.setDefaultTimeout(HttpMultipartRestClient.java:588)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:222)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:199)
at org.dataone.client.rest.HttpMultipartRestClient.<init>(HttpMultipartRestClient.java:184)
at edu.ucsb.nceas.mdqengine.DataONE.getMultipartD1Node(DataONE.java:72)
To Do
metacat/hashstore
to /mnt/tdg-repos/dev
via parallel Rsyncmetacat.properties
store.store_path
field to be /mnt/tdg-repos/dev/metacat/hashstore
dev.nceas.ucsb.edu
feature-hashstore-support
image to metadig-worker
, metadig-scheduler
and metadig-scorer
pods in the dev test clusterUser-Agent
as part of the get request to retrieve data objects from metacat-solr
java/17.0.1-temurin
does not appear to be acceptable/opt/local/metadig/data/all-dataone-formats.xml
(and why it is not currently saving as expected)AcquireWebResources
where we do not check for a null value after retrieving a path that causes a NullPointerException
metacat-scheduler
where Caused by: java.lang.NullPointerException: Cannot assign field "after" because "link.before" is null
RequestScorerJob
relating to the configuration of URLs to collect pidsRequestReportJob
which now appears ok but observed the issue previouslydev.nceas
(via metacatUI or any other client)
Check in:
User-Agent
Value
Chrome
or Mozilla
is rejected by solr
Mozilla/MetadigEngine (feature-hashstore-support)
AcquireWebResourcesJob
/opt/local/metadig/data
is now present.RequestReportJob
& RequestScorerJob
test.arcticdata.io
- however, after submitting datasets through test.adc
's respective GUI, the exception could not be reproduced.
metacatui
was adjusted at test.arcticdata.io
to allow submitters to set datasets to private/public (which was turned off before, and all datasets were private by default and had to be approved by an admin). My gut feeling is that this setting prevented metadig-scheduler
from loading up its respective quality suites to run (quality-test-dataone-fair
, portal-test-arctic-FAIR
), leading to this exception every time a dataset was submitted to this node. Submitting new datasets does not trigger any exceptions after the metacatui
setting change.
showDatasetPublicToggle: true
showDatasetPublicToggleForSubjects: []
HttpMultipartRestClient
process
try {
mrc = new HttpMultipartRestClient();
} catch (Exception ex) {
log.error("Error creating rest client: " + ex.getMessage());
metadigException = new MetadigProcessException("Unable to get collection pids");
metadigException.initCause(ex);
throw metadigException;
}
metadig
is having trouble accessing private datasets despite having the DATAONE_AUTH_TOKEN
present. Should quality checks be run on private datasets, or only if they are made public?To Do
metacat/hashstore
to /mnt/tdg-repos/dev
via parallel Rsyncmetacat.properties
store.store_path
field to be /mnt/tdg-repos/dev/metacat/hashstore
dev.nceas.ucsb.edu
feature-hashstore-support
image to metadig-worker
, metadig-scheduler
and metadig-scorer
pods in the dev test clusterUser-Agent
as part of the get request to retrieve data objects from metacat-solr
java/17.0.1-temurin
does not appear to be acceptable/opt/local/metadig/data/all-dataone-formats.xml
(and why it is not currently saving as expected)AcquireWebResources
where we do not check for a null value after retrieving a path that causes a NullPointerException
metacat-scheduler
where Caused by: java.lang.NullPointerException: Cannot assign field "after" because "link.before" is null
RequestScorerJob
relating to the configuration of URLs to collect pidsRequestReportJob
which now appears ok but observed the issue previouslyquality
and downloads
tasks appear to be executing, but unclear which aspect of the Assessment Report represents the new data quality check.dev.nceas
(via metacatUI or any other client)
Testing locally has gone well but it would be nice to test the engine against a hashstore on the dev cluster
to that end I've mounted the
tdg
subvolume on metadig-worker, and that subvolume was mounted ondev.nceas
where there is a hashstore metacat running. Seehelm/metadig-worker/pv.yaml
andhelm/metadig-worker/pvc.yaml
for details on the existing mounts.In order to actually test though the following steps are needed:
metacat/hashstore
to/mnt/tdg-repos/dev
via parallel Rsyncmetacat.properties
store.store_path
field to bevar/data/respos/dev/hashstore
metadig-engine
to the test clusterdev.nceas
(via metacatUI or any other client)