Hugovdberg / PIconnect

A python connector to the OSISoft PI and PI-AF databases
MIT License
91 stars 41 forks source link

Export series in blocks #488

Open Hugovdberg opened 5 years ago

Hugovdberg commented 5 years ago

Feature request - Blockwise export

Abstract

PI systems allow a maximum number of events to be exported at once. This feature request proposes an optional blockwise export of events.

Motivation and summary

PI Systems have a maximum number of events that can be exported at once (150k for PI versions before 2012, 1.5M for later versions). To get more than that amount of events the data needs to be exported in blocks at most as large as the limit.

Suggested solution

Add a new argument blocksize to the data extraction methods, that defaults to None (meaning the data will be exported in a single request). When the blocksize is set, we can implement an algorithm as explained in this deep dive to extract all values in the requested interval.

In summary it exports the data in blocks of at most blocksize events. Each new block starts at the last timestamp in the previous block. To remove duplicates, the as many events at the top of the new block are removed as there were with that timestamp at the bottom of the previous block. The full motivation for this algorithm is explained in the deep dive.

Rejected options

No alternative options were considered so far.

Additional context

Special thanks to Rick Davin, who explains the problem and solution much clearer than I could have done.

ldariva commented 4 years ago

Interresting. I have faced a similar issue with another application. I created a windows form application to collect data from both OSI PI and Aspen IP.21 databases. In IP21 there is a adjusted limit of how many rows can be retrieved in a select statments (the default value is 100.000). If the data interval is too long this limit of 100k is not enough. So to overcome this problem I extrac the data in batches of 100k. I did something similar for OSI databases where I limit the selection interval in 2 months. So if my period is longer than 2 months I select batches of two months and combine them at the end.