NationalSecurityAgency / datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
https://code.nsa.gov/datawave
Apache License 2.0
563 stars 244 forks source link

quickstart fail ingest error during install #1119

Open abazabaaa opened 3 years ago

abazabaaa commented 3 years ago

Hi,

I have tried a few things and feel I have configured my machine in a way that reflects what is specified in the quickstart guide, but I am running into a few issues. I can complete the install with "success" but once the web test starts, wildfly goes up and then I am stuck with the following until timeout:

[DW-INFO] - Polling for EAR deployment status every 4 seconds (15 attempts max) -- Wildfly process not found (1/15) +- Wildfly up (132683). EAR deployment pending (2/15)

Centos 7 64cores, 64GB ram

$ java -version
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)

$ ulimit -u
32768

$ ulimit -n
32768

$ sudo sysctl vm.swappiness=0

$ git clone https://github.com/NationalSecurityAgency/datawave.git
$ echo "source my_path/contrib/datawave-quickstart/bin/env.sh" >> ~/.bashrc
$ source ~/.bashrc
$  allInstall

Potentially relevant error: Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/datawave/ingest/work/jobCacheB/datawave-ws-common-util-4.0.0-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1587) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1580) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1595) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:325) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:236) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:111) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:69) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:220) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:133) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at datawave.ingest.mapreduce.job.IngestJob.run(IngestJob.java:377) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at datawave.ingest.mapreduce.job.IngestJob.main(IngestJob.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:318) at org.apache.hadoop.util.RunJar.main(RunJar.java:232) [DW-WARN] - The IngestJob class encountered errors (exit status: 1). See job log above for details

Post install:

====== Hadoop Status ====== pids: 103749 104126 104760 105121 105737 [DW-INFO] - NodeManager => 105121 [DW-INFO] - NameNode => 103749 [DW-INFO] - JobHistoryServer => 105737 [DW-INFO] - ResourceManager => 104760 [DW-INFO] - DataNode => 104126 [DW-WARN] - SecondaryNameNode is not running ====== Accumulo Status ====== pids: 106905 107002 107283 107361 107415 [DW-INFO] - gc => 107361 [DW-INFO] - master => 107283 [DW-INFO] - tracer => 107415 [DW-INFO] - monitor => 106905 [DW-INFO] - tserver => 107002 ====== ZooKeeper Status ====== [DW-INFO] - ZooKeeper => 106797 ====== DataWave Ingest Status ====== [DW-INFO] - No ingest processes are running ====== DataWave Web Status ====== [DW-WARN] - Wildfly is not running

abazabaaa commented 3 years ago

I think I resolved my own issue above.

It took me quite a while to figure out how to perm change ulimit -n and -u..

Maybe worth adding this to the quickstart guide for those who use centos 7 but aren't seasoned in its use?

$ sudo bash $ nano /etc/sysctl.conf

add the following:

fs.file-max = 100000 $ nano /etc/security/limits.conf

add the following

Then logout/in.

$ git clone https://github.com/NationalSecurityAgency/datawave.git --branch version/3.2 $ echo "ulimit -u 65535" >> ~/.bashrc $ echo "source my_path/contrib/datawave-quickstart/bin/env.sh" >> ~/.bashrc $ source ~/.bashrc $ allInstall

This proceeded with no warnings or errors except for the last one:

[DW-WARN] - The IngestJob class encountered errors (exit status: 251). See job log above for details

[DW-INFO] - You may view M/R job UI here: http://localhost:8088/cluster

[DW-INFO] - NOTE: Regarding the [DW-WARN] message above and associated errors...

By design, the test file 'my.csv' should have generated 1 EVENT_FATAL_ERROR and 1
MISSING_DATA_ERROR, both of which should be reflected in the 'counters' section of the job
log above. Both are due to one 'bad' CSV record having a null SECURITY_MARKING field. For
demonstration purposes, we've forced the missing-data error via the class configured to be
our IngestPolicyEnforcer, which handles record-level validations...

In 'mycsv-ingest-config.xml', note our policy enforcer: datawave.policy.ExampleIngestPolicyEnforcer.
As a result, CSV records found to be flagged with MISSING_DATA_ERROR will be evicted and
will not be written to DataWave's primary data schema, i.e., the 'shard*' tables
abazabaaa commented 3 years ago

Unfortunately, I am still stuck at this point:

$ datawaveWebStart && datawaveWebTest [DW-INFO] - Starting Wildfly [DW-INFO] - Polling for EAR deployment status every 4 seconds (15 attempts max) -- Wildfly process not found (1/15) +- Wildfly up (230073). EAR deployment pending (2/15) +- Wildfly up (230073). EAR deployment pending (3/15) +- Wildfly up (230073). EAR deployment pending (4/15) +- Wildfly up (230073). EAR deployment pending (5/15) +- Wildfly up (230073). EAR deployment pending (6/15) +- Wildfly up (230073). EAR deployment pending (7/15) +- Wildfly up (230073). EAR deployment pending (8/15) +- Wildfly up (230073). EAR deployment pending (9/15) +- Wildfly up (230073). EAR deployment pending (10/15) +- Wildfly up (230073). EAR deployment pending (11/15) +- Wildfly up (230073). EAR deployment pending (12/15) +- Wildfly up (230073). EAR deployment pending (13/15) +- Wildfly up (230073). EAR deployment pending (14/15) +- Wildfly up (230073). EAR deployment pending (15/15)

abazabaaa commented 3 years ago

Sorry if I am spamming.. I am a chemist by training and quite intrigued by this tools ability to ingest xml/json. A lot of my work is focused on doing this with chemical databases, and it is rather painful.

I am trying to sort out how to get the correct certs in the browser.. which i suspect is the reason for the errors below.

$ datawaveWebTest --verbose --pretty-print [DW-INFO] - Converting client certificate into more portable PKI materials. Should work no matter which versions you have of CURL, OpenSSL, NSS, etc


Test UID: DiscoveryQuery/CreateQuery Test Description: Creates DiscoveryQuery for wikipedia articles containing the word 'anarchy' (i.e., hit count by date)

Test Command: /usr/bin/curl --silent --write-out 'HTTP_STATUS_CODE:%{http_code};TOTAL_TIME:%{time_total};CONTENT_TYPE:%{content_type}' --insecure --cert '/home/schrogpu/datawave_test/datawave/contrib/datawave-quickstart/data/datawave/pki-temp/testUser.pem' --key '/home/schrogpu/datawave_test/datawave/contrib/datawave-quickstart/data/datawave/pki-temp/testUser.key.rsa' --cacert '/home/schrogpu/datawave_test/datawave/contrib/datawave-quickstart/data/datawave/pki-temp/testUser.ca' --header 'Content-Type: application/x-www-form-urlencoded' -d query=anarchy -d queryName=DiscoveryQueryTest001 -d begin=20130301 -d end=20130401 -d pagesize=10 -d auths=PUBLIC -d columnVisibility=PRIVATE -d query.syntax=LUCENE -X POST https://localhost:8443/DataWave/Query/DiscoveryQuery/create

HTTP Response Status Code: 404 HTTP Response ContentType: text/html

HTTP Response Body:

Error404 - Not Found

Test Finished: DiscoveryQuery/CreateQuery Test Total Time: 0.544 Test Status: [X] FAILED - Expected Code: '200' Actual: '404' [X] FAILED - Expected Content-Type: 'application/xml;charset=UTF-8' Actual: 'text/html'

keith-ratcliffe commented 3 years ago

It appears that, for some reason, the DW web services are not deploying successfully. Please check $WILDFLY_HOME/standalone/log/server.log, and share the first occurrences of errors/stacktraces that appear there. Subsequent errors appearing later in the file are less relevant

abazabaaa commented 3 years ago

It appears that, for some reason, the DW web services are not deploying successfully. Please check $WILDFLY_HOME/standalone/log/server.log, and share the first occurrences of errors/stacktraces that appear there. Subsequent errors appearing later in the file are less relevant

This appears to be the first error:

2021-03-21 16:44:45,950 ERROR [org.jboss.as.controller.management-operation] (ServerService Thread Pool -- 48) WFLYCTL0013: Operation ("add") failed - address: ([ ("subsystem" => "ee"), ("managed-executor-service" => "default") ]) - failure description: "WFLYEE0113: The max-threads value 48 cannot be less than the core-threads value 128."

abazabaaa commented 3 years ago

This might have already been dealt with under https://github.com/NationalSecurityAgency/datawave/issues/672

Trying this..

keith-ratcliffe commented 3 years ago

Ok, i believe there are DW build properties that you can override to change those settings. For now, you can fix by either lowering the specified core-threads value or by raising the max-threads value in your standalone-full.xml under $WILDFLY_HOME/standalone

abazabaaa commented 3 years ago

Thanks, I changed: $WILDFLY_HOME/standalone/configuration/standalone-full.xml line 549 max-threads="48" to max-threads="128"

                <managed-executor-services>
                    <managed-executor-service name="default" jndi-name="java:jboss/ee/concurrency/executor/default" context-service="default" hung-task-threshold="60000" max-threads="128" keepalive-time="5000"/>

I then ran:

$ allStop $ datawaveWebStart && datawaveWebTest

Tail of the readout:


Overall Summary


Test Count: 39

Tests Passed: 39 DiscoveryQuery/CreateQuery DiscoveryQuery/GetPagedResults DiscoveryQuery/QueryCloseTest EdgeQuery/CreateAndNext EdgeQuery/QueryCloseTest ErrorEventQuery/QueryMissingDataError ErrorEventQuery/QueryCloseTest EventQuery400BadRequest/IntentionalError EventQueryJexlSyntax/CreateJexlUnfielded EventQueryJexlSyntax/JexlUnfieldedNext EventQueryJexlSyntax/QueryCloseTest EventQueryJexlSyntax/CreateJexlFielded EventQueryJexlSyntax/JexlFieldedPage1 EventQueryJexlSyntax/JexlFieldedPage2 EventQueryJexlSyntax/204OnJexlFieldedPage3 EventQueryJsonGrouped/GroupedWithSameGreatgrandparent EventQueryJsonGrouped/GroupedWithSameGreatgrandparentPage1 EventQueryJsonGrouped/QueryCloseTest EventQueryJsonGrouped/GroupedWithSameParent EventQueryJsonGrouped/GroupedWithSameParentPage1 EventQueryJsonGrouped/QueryCloseTest EventQueryJsonGrouped/CreateNoMatchForGroupedSiblings EventQueryJsonGrouped/NoMatchForGroupedSiblings204 EventQueryLuceneSyntax/CreateLuceneUnfieldedQuery EventQueryLuceneSyntax/LuceneUnfieldedQueryPage1 EventQueryLuceneSyntax/QueryCloseTest EventQueryLuceneSyntax/CreateLuceneFieldedQuery EventQueryLuceneSyntax/LuceneFieldedQueryPage1 EventQueryLuceneSyntax/LuceneFieldedQueryPage2 EventQueryLuceneSyntax/204LuceneFieldedQueryPage3 GetDeployedQueryLogics/ListQueryLogicGET GetEffectiveAuthorizations/ListEffectiveAuthsGET LookupUUID/LookupWikipediaByPageId LookupUUID/LookupWikipediaByPageTitle LookupUUID/LookupWikipediaByPageTitleDNE204 LookupUUID/LookupUnregisteredUidType QueryMetrics/CreateAndNext QueryMetrics/QueryCloseTest QueryMetrics/NoMetricsResults204

Failed Tests: 0


[DW-INFO] - Cleaning up temporary files

Success!

abazabaaa commented 3 years ago

For step 4 on: https://code.nsa.gov/datawave/docs/2.9/tour/ingest-basics#step-1-define-the-data-type

$ ./ingest-tv-shows.sh --download-only --outfile ~/more-tv-shows.json [DW-INFO] - Writing json records to /home/schrogpu/more-tv-shows.json [DW-INFO] - Downloading show data: 'Veep' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'Game of Thrones' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'I Love Lucy' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'Breaking Bad' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'Malcom in the Middle' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'The Simpsons' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'Sneaky Pete' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'King of the Hill' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'Three's Company' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'The Andy Griffith Show' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'Matlock' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'North and South' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Downloading show data: 'MASH' File "", line 1 import sys,json;data=json.loads(sys.stdin.read()); print json.dumps(data, indent=2, sort_keys=True) ^ SyntaxError: invalid syntax [DW-INFO] - Data download is complete

I think the issue here is that the print command is written with python 2 in mind?

hlgp commented 9 months ago

Yes, we are currently using python 2.7.5