GridProtectionAlliance / openPDC

Open Source Phasor Data Concentrator
MIT License
128 stars 59 forks source link

Missing Frame stats corrupt? #37

Closed bnubile007 closed 7 years ago

bnubile007 commented 7 years ago

With version 2.2.63.0 I have noticed that input stream statistics for missing frames seems to be corrupted about once a minute for some reason. Normally it reports zero or some small single digit number for missing frames, but then about once a minute it reports a very large value and the value will increase the next minute. This random spike looks to be maybe a calculation gone bad or something? It happens for all 7 of our input streams, it is not unique to just one input. This high value will then roll over at some point. See attached screenshot. The identical value shows up for 'missing data' also. I can't explain the spiking, nor why the value increases over time and then rolls over.?

missing frames1

ritchiecarroll commented 7 years ago

First, please verify if protocol is IEEE C37.118, second is connection to a PDC or a direct connection to PMU, third, is connection only TCP, only UDP or UDP with a TCP control channel?

Also, if connection is PDC, how many devices are in the steam? Is there any device that tends to give you trouble?

What about the data? Everything look fine? Does anomaly happen all the time or just sporadically?

What I find highly suspect is that missing data and missing frames match.

Is it possible we can get log files for the same period of the event?

We will certainly get to the bottom of this...

Thanks, Ritchie

bnubile007 commented 7 years ago

sorry, apparently replies via email don't work. Pasting my response in here manually.

First, please verify if protocol is IEEE C37.118, second is connection to a PDC or a direct connection to PMU, third, is connection only TCP, only UDP or UDP with a TCP control channel? C37.118, PDC, TCP only Also, if connection is PDC, how many devices are in the steam? Is there any device that tends to give you trouble? Its being reported for all 7 PDC input streams which range from 3-34 PMU devices. So it’s not unique to any one device. What about the data? Everything look fine? Does anomaly happen all the time or just sporadically? The data looks fine, most everything else looks fine. I only stumbled upon this because I turned on the ‘data quality reports’ and it was showing one of the sites as having a low completeness %, yet there were no errors. So I tried looking at the other stats more closely and I noticed this missing frames value jumping around. So overall I have no idea if I’m really missing data or not. In general I don’t think I am, otherwise the different departments that actually use the data would be complaining to me about holes in the data – i.e. according to these stats I would have holes all over the place from every single input stream, which seems unlikely. Below is the server(A) completeness report snip-

stat1_servera

        The level 2 stats make no sense.  The other levels seem to line up with expectations, but all the level 2 data seems skewed.

_What I find highly suspect is that missing data and missing frames match.

        Agreed.  Some other info:

• This server(A) is pushing all of its data to another v2.2 server(B) and its completeness report looks normal and these stats are correct:

stat1_serverb

• The server(A) with the skewed missing frame numbers is running v2.2 and the database was upgraded from the previous v1.5 SP1. The v2.2 server (B) with the correct stats was built with a new database. (starting over with server(A) database is not an option) Is it possible we can get log files for the same period of the event? The event is continuous and the issue appears immediately after startup, so I’m not sure what logs you may want. Statuslog has basically nothing in it. Errorlog has nothing but connect attempts. I don’t think there are any errors.

ritchiecarroll commented 7 years ago

This is looking like a configuration issue of some kind. It might be corrected simply by refreshing the source configuration. The simplest way to refresh an IEEE C37.118 source configuration is by clicking the "Update Configuration" button on the device list:

image

Le me know if that helps. If not we may need to manually check your configuration.

bnubile007 commented 7 years ago

Some more info - we have just installed another brand new server with a fresh install of openpdc v2.2 and it has a new connection for incoming data and it too is registering the strange missing frame count.
new_connection_stats

I've tried the "update Configuration" for this new stream, and I'll update another stream on one of our other servers as well and monitor. I'll check these early next week and see what the stats look like and get back to you.

bnubile007 commented 7 years ago

No change. The value still increases and then rolls over and repeats. However my new server was sitting doing nothing. Seems openpdc crashed with only the following message: 12/12/2016 12:31am (utc) Application: openPDC.exe Framework Version: v4.0.30319 Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFCD6A00973 (00007FFCD6920000) with exit code 80131506. Not sure what to do with that.

ritchiecarroll commented 7 years ago

That's an issue we have seen with .NET 4.5 that can only be corrected via patch - it doesn't happen in later versions of the openPDC 2.3+ where we changed the base .NET code to be dependent on .NET 4.6. Don't suppose it would be possible to try a later version, e.g., https://github.com/GridProtectionAlliance/openPDC/releases/tag/v2.3 ? There have been many bug fixes: https://github.com/GridProtectionAlliance/gsf/wiki/GSF-v2.1.346-Release-Notes

The next version that we hope to release before the end of the year also includes the following: https://github.com/GridProtectionAlliance/gsf/wiki/GSF-v2.1.380-Release-Notes

bnubile007 commented 7 years ago

Yes we should be able to upgrade. Looks like we already have .net 4.6.2 installed, so we just need to upgrade to v2.3.

bnubile007 commented 7 years ago

That update configuration button has made a bigger mess. I did not realize it when I made the change last week, but it broke all those devices in the output streams, and now I have no stats on the PMU's under the PDC that I updated. Not sure what to do now but start over and destroy the PDC and PMU's and re-create them. That will involve re-modifiying all the output streams again. busted stats

ritchiecarroll commented 7 years ago

Can we setup a WebEx? I think that this is still a configuration issue we can help you resolve.

The openPDC Manager will only show device statistics that are associated with a device, if the statistics disappeared, that means the statistics were either deleted or disassociated with the devices. BTW - we cannot replicate your issue here - so there is something going on that's a little outside the usual. Wondering if the statistics are properly defined at this point.

Do you have my e-mail address for correspondence?

Thanks, Ritchie

ritchiecarroll commented 7 years ago

Closing for now. Feel free to re-open if issue remains post update of openPDC 2.4.