Open lindonm opened 4 days ago
@lindonm unfortunately I don't have Splunk Cloud, however testing with the versions of the other apps you listed I cannot recreate the issue so far. I am wondering if there exists any strange characters in the events it is maybe choking on. Does this only occur with sourcetype="o365:management:activity"
? Can you try a search like this with the built-in lookups to create a result of over 10000 events?
| inputlookup moby_dick.csv
| append
[ inputlookup peter_pan.csv]
| cleantext textfield=sentence keep_orig=true base_word=true remove_stopwords=false force_nltk_tokenize=true base_type="lemma_pos" term_min_len=1 ngram_mix=false
Thanks @geekusa ,
Ran that query and it succeeds with no errors in the UI
This search has completed and has returned 12,750 results by scanning 0 events in 25.786 seconds
The following messages were returned by the search subsystem:
info : [subsearch]: Successfully read lookup file '/opt/splunk/etc/apps/nlp-text-analytics/lookups/peter_pan.csv'.
I did note some of the same/similar errors in the search log, so maybe those are a red herring.
0-31-2024 23:21:59.314 INFO SearchParser [3300457 searchOrchestrator] - PARSING: | inputlookup moby_dick.csv\n| append \n [ inputlookup peter_pan.csv]\n| cleantext textfield=sentence keep_orig=true base_word=true remove_stopwords=false force_nltk_tokenize=true base_type="lemma_pos" term_min_len=1 ngram_mix=false
10-31-2024 23:21:59.318 INFO ServerConfig [3300457 searchOrchestrator] - Will add app jailing prefix /opt/splunk/bin/nsjail-wrapper for nlp-text-analytics
10-31-2024 23:21:59.318 INFO ChunkedExternProcessor [3300457 searchOrchestrator] - Running process: /opt/splunk/bin/nsjail-wrapper /opt/splunk/bin/python3.7m /opt/splunk/etc/apps/nlp-text-analytics/bin/cleantext.py
10-31-2024 23:21:59.382 ERROR ChunkedExternProcessor [3300462 ChunkedExternProcessorStderrLogger] - stderr: Failed to run splunk as SPLUNK_OS_USER. This command can only be run by bootstart user.
10-31-2024 23:21:59.382 ERROR ChunkedExternProcessor [3300462 ChunkedExternProcessorStderrLogger] - stderr: /opt/splunk/etc/apps/Splunk_SA_Scientific_Python_linux_x86_64/bin/linux_x86_64/bin/python: line 5: [: ==: unary operator expected
10-31-2024 23:22:00.690 INFO SearchParser [3300457 searchOrchestrator] - PARSING: inputlookup peter_pan.csv
10-31-2024 23:22:00.690 INFO AstOptimizer [3300457 searchOrchestrator] - SrchOptMetrics optimize_toJson=1.373341992
10-31-2024 23:22:00.690 INFO SearchParser [3300457 searchOrchestrator] - PARSING: | inputlookup "moby_dick.csv" | append [| inputlookup "peter_pan.csv"] | cleantext textfield=sentence keep_orig=true base_word=true remove_stopwords=false force_nltk_tokenize=true base_type="lemma_pos" term_min_len=1 ngram_mix=false
10-31-2024 23:22:00.690 INFO SearchParser [3300457 searchOrchestrator] - PARSING: | inputlookup "moby_dick.csv" | append [| inputlookup "peter_pan.csv"] | cleantext textfield=sentence keep_orig=true base_word=true remove_stopwords=false force_nltk_tokenize=true base_type="lemma_pos" term_min_len=1 ngram_mix=false
10-31-2024 23:22:00.690 INFO ServerConfig [3300457 searchOrchestrator] - Will add app jailing prefix /opt/splunk/bin/nsjail-wrapper for nlp-text-analytics
10-31-2024 23:22:00.690 INFO ChunkedExternProcessor [3300457 searchOrchestrator] - Running process: /opt/splunk/bin/nsjail-wrapper /opt/splunk/bin/python3.7m /opt/splunk/etc/apps/nlp-text-analytics/bin/cleantext.py
10-31-2024 23:22:00.746 ERROR ChunkedExternProcessor [3300527 ChunkedExternProcessorStderrLogger] - stderr: Failed to run splunk as SPLUNK_OS_USER. This command can only be run by bootstart user.
10-31-2024 23:22:00.746 ERROR ChunkedExternProcessor [3300527 ChunkedExternProcessorStderrLogger] - stderr: /opt/splunk/etc/apps/Splunk_SA_Scientific_Python_linux_x86_64/bin/linux_x86_64/bin/python: line 5: [: ==: unary operator expected
10-31-2024 23:22:01.406 INFO SearchParser [3300457 searchOrchestrator] - PARSING: | inputlookup "peter_pan.csv"
10-31-2024 23:22:01.410 INFO AstOptimizer [3300457 searchOrchestrator] - SrchOptMetrics optimize_toJson=0.717763582
Further to this, in experimenting trying to determine how much impact the actual source data makes;
Works:
| makeresults
| eval textcheck="My Text Here"
| fields textcheck
| cleantext textfield=textcheck keep_orig=true base_word=true remove_stopwords=false force_nltk_tokenize=true base_type="lemma_pos" term_min_len=1 ngram_mix=false
| append
[search sourcetype="o365:management:activity" (Operation="New-InboxRule" OR Operation="Set-InboxRule")
| head 1]
Works:
| makeresults
| eval textcheck="My Text Here"
| fields textcheck
| cleantext textfield=textcheck keep_orig=true base_word=true remove_stopwords=false force_nltk_tokenize=true base_type="lemma_pos" term_min_len=1 ngram_mix=false
| append
[search sourcetype="o365:management:activity" (Operation="New-InboxRule" OR Operation="Set-InboxRule")
]
Potentially related to recent update to Splunk_SA_Scientific_Python_linux_x86_64 - We are attempting to downgrade that app, but as we are Splunk Cloud, and that app is >500mb, we are unable to do so ourselves and are waiting on support team.
The following search fails with an error:
Error log details
If however I run this search, the search runs as expected with no erros:
Also this search works as well, by limiting the results?
I have experimented with multiple numbers of results from "| head 1" to "|head 10000" - They all work but as soon as I remove the head command it fails. Note that in my selected time period there are only 24 entries, so even with "|head 1000" it works fine, but as soon as I remove that it fails with error.
Splunk Cloud Version:9.2.2406.107 (Victoria)
nlp-text-analytics v1.2.0 Splunk_SA_Scientific_Python_linux_x86_64 v4.2.1 Splunk_ML_Toolkit v5.4.2