Open abazabaaa opened 3 years ago
I think I resolved my own issue above.
It took me quite a while to figure out how to perm change ulimit -n and -u..
Maybe worth adding this to the quickstart guide for those who use centos 7 but aren't seasoned in its use?
$ sudo bash $ nano /etc/sysctl.conf
fs.file-max = 100000 $ nano /etc/security/limits.conf
Then logout/in.
$ git clone https://github.com/NationalSecurityAgency/datawave.git --branch version/3.2 $ echo "ulimit -u 65535" >> ~/.bashrc $ echo "source my_path/contrib/datawave-quickstart/bin/env.sh" >> ~/.bashrc $ source ~/.bashrc $ allInstall
This proceeded with no warnings or errors except for the last one:
[DW-WARN] - The IngestJob class encountered errors (exit status: 251). See job log above for details
[DW-INFO] - You may view M/R job UI here: http://localhost:8088/cluster
[DW-INFO] - NOTE: Regarding the [DW-WARN] message above and associated errors...
By design, the test file 'my.csv' should have generated 1 EVENT_FATAL_ERROR and 1
MISSING_DATA_ERROR, both of which should be reflected in the 'counters' section of the job
log above. Both are due to one 'bad' CSV record having a null SECURITY_MARKING field. For
demonstration purposes, we've forced the missing-data error via the class configured to be
our IngestPolicyEnforcer, which handles record-level validations...
In 'mycsv-ingest-config.xml', note our policy enforcer: datawave.policy.ExampleIngestPolicyEnforcer.
As a result, CSV records found to be flagged with MISSING_DATA_ERROR will be evicted and
will not be written to DataWave's primary data schema, i.e., the 'shard*' tables
Unfortunately, I am still stuck at this point:
$ datawaveWebStart && datawaveWebTest [DW-INFO] - Starting Wildfly [DW-INFO] - Polling for EAR deployment status every 4 seconds (15 attempts max) -- Wildfly process not found (1/15) +- Wildfly up (230073). EAR deployment pending (2/15) +- Wildfly up (230073). EAR deployment pending (3/15) +- Wildfly up (230073). EAR deployment pending (4/15) +- Wildfly up (230073). EAR deployment pending (5/15) +- Wildfly up (230073). EAR deployment pending (6/15) +- Wildfly up (230073). EAR deployment pending (7/15) +- Wildfly up (230073). EAR deployment pending (8/15) +- Wildfly up (230073). EAR deployment pending (9/15) +- Wildfly up (230073). EAR deployment pending (10/15) +- Wildfly up (230073). EAR deployment pending (11/15) +- Wildfly up (230073). EAR deployment pending (12/15) +- Wildfly up (230073). EAR deployment pending (13/15) +- Wildfly up (230073). EAR deployment pending (14/15) +- Wildfly up (230073). EAR deployment pending (15/15)
Sorry if I am spamming.. I am a chemist by training and quite intrigued by this tools ability to ingest xml/json. A lot of my work is focused on doing this with chemical databases, and it is rather painful.
I am trying to sort out how to get the correct certs in the browser.. which i suspect is the reason for the errors below.
$ datawaveWebTest --verbose --pretty-print [DW-INFO] - Converting client certificate into more portable PKI materials. Should work no matter which versions you have of CURL, OpenSSL, NSS, etc
Test UID: DiscoveryQuery/CreateQuery Test Description: Creates DiscoveryQuery for wikipedia articles containing the word 'anarchy' (i.e., hit count by date)
Test Command: /usr/bin/curl --silent --write-out 'HTTP_STATUS_CODE:%{http_code};TOTAL_TIME:%{time_total};CONTENT_TYPE:%{content_type}' --insecure --cert '/home/schrogpu/datawave_test/datawave/contrib/datawave-quickstart/data/datawave/pki-temp/testUser.pem' --key '/home/schrogpu/datawave_test/datawave/contrib/datawave-quickstart/data/datawave/pki-temp/testUser.key.rsa' --cacert '/home/schrogpu/datawave_test/datawave/contrib/datawave-quickstart/data/datawave/pki-temp/testUser.ca' --header 'Content-Type: application/x-www-form-urlencoded' -d query=anarchy -d queryName=DiscoveryQueryTest001 -d begin=20130301 -d end=20130401 -d pagesize=10 -d auths=PUBLIC -d columnVisibility=PRIVATE -d query.syntax=LUCENE -X POST https://localhost:8443/DataWave/Query/DiscoveryQuery/create
HTTP Response Status Code: 404 HTTP Response ContentType: text/html
HTTP Response Body:
Test Finished: DiscoveryQuery/CreateQuery Test Total Time: 0.544 Test Status: [X] FAILED - Expected Code: '200' Actual: '404' [X] FAILED - Expected Content-Type: 'application/xml;charset=UTF-8' Actual: 'text/html'
It appears that, for some reason, the DW web services are not deploying successfully. Please check $WILDFLY_HOME/standalone/log/server.log, and share the first occurrences of errors/stacktraces that appear there. Subsequent errors appearing later in the file are less relevant
It appears that, for some reason, the DW web services are not deploying successfully. Please check $WILDFLY_HOME/standalone/log/server.log, and share the first occurrences of errors/stacktraces that appear there. Subsequent errors appearing later in the file are less relevant
This appears to be the first error:
2021-03-21 16:44:45,950 ERROR [org.jboss.as.controller.management-operation] (ServerService Thread Pool -- 48) WFLYCTL0013: Operation ("add") failed - address: ([ ("subsystem" => "ee"), ("managed-executor-service" => "default") ]) - failure description: "WFLYEE0113: The max-threads value 48 cannot be less than the core-threads value 128."
This might have already been dealt with under https://github.com/NationalSecurityAgency/datawave/issues/672
Trying this..
Ok, i believe there are DW build properties that you can override to change those settings. For now, you can fix by either lowering the specified core-threads value or by raising the max-threads value in your standalone-full.xml under $WILDFLY_HOME/standalone
Thanks, I changed: $WILDFLY_HOME/standalone/configuration/standalone-full.xml line 549 max-threads="48" to max-threads="128"
<managed-executor-services>
<managed-executor-service name="default" jndi-name="java:jboss/ee/concurrency/executor/default" context-service="default" hung-task-threshold="60000" max-threads="128" keepalive-time="5000"/>
I then ran:
$ allStop $ datawaveWebStart && datawaveWebTest
Tail of the readout:
Overall Summary
Test Count: 39
Tests Passed: 39 DiscoveryQuery/CreateQuery DiscoveryQuery/GetPagedResults DiscoveryQuery/QueryCloseTest EdgeQuery/CreateAndNext EdgeQuery/QueryCloseTest ErrorEventQuery/QueryMissingDataError ErrorEventQuery/QueryCloseTest EventQuery400BadRequest/IntentionalError EventQueryJexlSyntax/CreateJexlUnfielded EventQueryJexlSyntax/JexlUnfieldedNext EventQueryJexlSyntax/QueryCloseTest EventQueryJexlSyntax/CreateJexlFielded EventQueryJexlSyntax/JexlFieldedPage1 EventQueryJexlSyntax/JexlFieldedPage2 EventQueryJexlSyntax/204OnJexlFieldedPage3 EventQueryJsonGrouped/GroupedWithSameGreatgrandparent EventQueryJsonGrouped/GroupedWithSameGreatgrandparentPage1 EventQueryJsonGrouped/QueryCloseTest EventQueryJsonGrouped/GroupedWithSameParent EventQueryJsonGrouped/GroupedWithSameParentPage1 EventQueryJsonGrouped/QueryCloseTest EventQueryJsonGrouped/CreateNoMatchForGroupedSiblings EventQueryJsonGrouped/NoMatchForGroupedSiblings204 EventQueryLuceneSyntax/CreateLuceneUnfieldedQuery EventQueryLuceneSyntax/LuceneUnfieldedQueryPage1 EventQueryLuceneSyntax/QueryCloseTest EventQueryLuceneSyntax/CreateLuceneFieldedQuery EventQueryLuceneSyntax/LuceneFieldedQueryPage1 EventQueryLuceneSyntax/LuceneFieldedQueryPage2 EventQueryLuceneSyntax/204LuceneFieldedQueryPage3 GetDeployedQueryLogics/ListQueryLogicGET GetEffectiveAuthorizations/ListEffectiveAuthsGET LookupUUID/LookupWikipediaByPageId LookupUUID/LookupWikipediaByPageTitle LookupUUID/LookupWikipediaByPageTitleDNE204 LookupUUID/LookupUnregisteredUidType QueryMetrics/CreateAndNext QueryMetrics/QueryCloseTest QueryMetrics/NoMetricsResults204
Failed Tests: 0
[DW-INFO] - Cleaning up temporary files
Success!
For step 4 on: https://code.nsa.gov/datawave/docs/2.9/tour/ingest-basics#step-1-define-the-data-type
$ ./ingest-tv-shows.sh --download-only --outfile ~/more-tv-shows.json
[DW-INFO] - Writing json records to /home/schrogpu/more-tv-shows.json
[DW-INFO] - Downloading show data: 'Veep'
File "
I think the issue here is that the print command is written with python 2 in mind?
Yes, we are currently using python 2.7.5
Hi,
I have tried a few things and feel I have configured my machine in a way that reflects what is specified in the quickstart guide, but I am running into a few issues. I can complete the install with "success" but once the web test starts, wildfly goes up and then I am stuck with the following until timeout:
[DW-INFO] - Polling for EAR deployment status every 4 seconds (15 attempts max) -- Wildfly process not found (1/15) +- Wildfly up (132683). EAR deployment pending (2/15)
Centos 7 64cores, 64GB ram
Potentially relevant error: Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/datawave/ingest/work/jobCacheB/datawave-ws-common-util-4.0.0-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1587) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1580) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1595) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:325) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:236) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:111) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:69) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:220) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:133) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at datawave.ingest.mapreduce.job.IngestJob.run(IngestJob.java:377) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at datawave.ingest.mapreduce.job.IngestJob.main(IngestJob.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:318) at org.apache.hadoop.util.RunJar.main(RunJar.java:232) [DW-WARN] - The IngestJob class encountered errors (exit status: 1). See job log above for details
Post install:
====== Hadoop Status ====== pids: 103749 104126 104760 105121 105737 [DW-INFO] - NodeManager => 105121 [DW-INFO] - NameNode => 103749 [DW-INFO] - JobHistoryServer => 105737 [DW-INFO] - ResourceManager => 104760 [DW-INFO] - DataNode => 104126 [DW-WARN] - SecondaryNameNode is not running ====== Accumulo Status ====== pids: 106905 107002 107283 107361 107415 [DW-INFO] - gc => 107361 [DW-INFO] - master => 107283 [DW-INFO] - tracer => 107415 [DW-INFO] - monitor => 106905 [DW-INFO] - tserver => 107002 ====== ZooKeeper Status ====== [DW-INFO] - ZooKeeper => 106797 ====== DataWave Ingest Status ====== [DW-INFO] - No ingest processes are running ====== DataWave Web Status ====== [DW-WARN] - Wildfly is not running