Open rushirajnenuji opened 3 years ago
@rushirajnenuji - To help investigate what look to be low metrics numbers in Elastic Search for the ADC node, I queried the access log table for January and February 2021 with:
SELECT count(docid) FROM access_log al
WHERE al.date_logged >= '2021-01-01'
AND date_logged < '2021-03-01'
AND lower(al.event) = 'read';
Which gives a result of 162,301 raw read events (which includes bots, etc. - no filtering).
Doing the same query in Kibana against the eventlog-*
indices, we only get 17,807 hits:
http://localhost:5601/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:'2021-01-01T00:00:00.000Z',mode:absolute,to:'2021-03-01T00:00:00.000Z'))&_a=(columns:!(userAgent,pid,event),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'0576af90-5c30-11e8-acac-67c9290041c8',key:nodeId,negate:!f,params:(query:'urn:node:ARCTIC',type:phrase),type:phrase,value:'urn:node:ARCTIC'),query:(match:(nodeId:(query:'urn:node:ARCTIC',type:phrase)))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'0576af90-5c30-11e8-acac-67c9290041c8',key:event,negate:!f,params:(query:read,type:phrase),type:phrase,value:read),query:(match:(event:(query:read,type:phrase))))),index:'0576af90-5c30-11e8-acac-67c9290041c8',interval:auto,query:(language:lucene,query:''),sort:!('@timestamp',desc))
Looking at the logsolr
index, I'm seeing 161,314:
http://localhost:8983/solr/event_core/select?q=nodeId:urn\%3Anode\%3AARCTIC%20AND%20event:read&start=0&rows=1&wt=json&fq=dateLogged%3A%5B2021-01-01T00%3A00%3A00.000000Z+TO+2021-03-01T00%3A00%3A00.000000Z%5D&sort=dateLogged+ASC" | jq .response.numFound
So it looks like the CN log aggregator has missed about 1000 events, but Elastic Search has missed 144,494. I will role the last harvest date back for the ADC repository to try to pick up those missing 1000, and will roll back the Filebeat/Logstash date to the beginning of the year to see if we pick up the missing 144K in the index. If so, all is good, we just had network or scheduled forwarding job issues. If not, we likely have a bug in the metrics service code. I will ask Val to do the same SQL query for ESS-DIVE because they are also seeing low numbers. Thanks!
This relates to #83
Update: Val ran the same query against the Metacat access_log
and got:
SELECT count(docid) FROM access_log al
WHERE al.date_logged >= '2021-01-01'
AND date_logged < '2021-03-01'
AND lower(al.event) = 'read';
count
--------
154866
(1 row)
The logsolr
core has 154866 documents, so the issue is definitely in the filebeat leg of the pipeline for ESS_DIVE.
Looking at the same query in Elastic Search, there are 0 events. I still need to roll back the filebeat forwarder.
@csjx Any update on this?
Hi @vchendrix - Yes, I rolled back the Elastic Search filebeat forwarder files to 2018-01-01
because we also had missing event content from other MNs back to 2018 due to harvesting issues (certs, network, etc.). Looking at Kibana for the ADC from 2021-01-01
to 2021-03-01
, the indexed value of raw read
events is still 17,807. We have millions of events to process, so I think it will take some time.
For ESS-DIVE, there 5,194 raw read
events in ES for that 3 month time frame, but I expect more to be picked up. While we are importing the events via filebeat and logstash, the raw events then get processed into sessions
with bots and double-clicks filtered out, etc., and that process is what takes time.
@rushirajnenuji - Can you give an estimate of the number of events that still need processing? I don't recall the ES search to do that.
Hi @csjx , I'm seeing 2464
raw events for ESS_DIVE nodeId in ES that still need processing. After filtering out the d1_admin_subject
tags, we are left with 706
events - (380 metadata, 326 data reads). (for range - 2021-01-01 to 2021-04-01
)
For date range 2021-01-01 to 2021-03-01
those counts are:
raw: 2290
unprocessed events (945 data read, 1345 metadata read events)
After filtering out d1_admin_subject
events: 671
unprocessed events (318 data read, 353 metadata read events)
The identifiers-*
index had about 30K missing identifiers. Given that this index is primarily used to populate and index portal metrics, it is important that this index stays in sync with DataONE CN.
Current status:
Next steps:
es_sysmeta_sync.py
to address the issue of missing identifiers (separate issue)
Ensure Elastic Search indexes
eventlog-*
andidentifiers-*
stay up to date.