GoogleCloudPlatform / dataflow-sample-applications

Apache License 2.0
126 stars 58 forks source link

Indicators.LAST and determinism #57

Closed danp11 closed 3 years ago

danp11 commented 3 years ago

Hi again :-)

With the VWAP indicator we are dependent of the closing price i.e. the last value in the window and it doesn't seems to be able to configure the .json file with timestamps so I get different result when I execute my VWAP test.

In TsBaseCombiner the below method has a comment about beeing non-deterministic. Is there anyway to configure for example TSAccumVWAPTest.json with a timestamp to make sure that the last "trade" in this file is populating the LAST indicator?

Thanks,

/Dan

// If our DataPoints timestamp is == max then use this DataPoint // TODO document that this is non-deterministic if (dataPoint.getTimestamp() == lastTimestamp) { accumStore.setLast(dataPoint.getData()); }

rezarokni commented 3 years ago

@dsdinter Could you please have a look at adding timestamps as a value in the json file directly? @danp11 You can make use of the TSTestStream directly as a stop gap.

Also note the non-deterministic note is about elements which have the same timestamp. So if we have A {time=1} and B{time-1} then which gets picked would be non-deterministic. If your test data does not have this equality then I think it should be good. But it does look like we could do with better doc in that class :-)

rezarokni commented 3 years ago

Interesting question..

When the price is gap filled, it is I guess semantically correct, if the price was x and no updates then the price is still x. However if Qty is y and then there is a gap fill, y will need to be 0 for it to be correct.

However we would want the VWAP value to be continually produced even if it has not 'changed'.

One thing that I think would need to be done in the metrics creation:  When a TSAccumSequence contains gapfilled and non-gapfilled data, discard all gap filled values ( gapfill==true in the TSAccum).

But for the case when the data is all gap filled, should the VWAP be just price? Or does the value need to be the last known VWAP value?

rezarokni commented 3 years ago

PS, found a bug with the CreateCompositeTSAccum fixed in : 08dabbb556e59526a83d8d487b52db162b1be261 Suspect there is more in this code path that will be found as metrics are created with multi input comps.

danp11 commented 3 years ago

Hi, tough question I know :-) but what is the future plan for this repo?

I do understand that its mainly a repo for "showing examples" but I think that what you have built could be a very good lib. for finance and other, real time metric applications.

We have, as an example, https://github.com/Talend/labs-beam-ml that show how to mix Java with python for metrics . A specific java lib built solution, is not really an option as we would just "inventing the wheel".

@Reaz Please let me/us know if this lib is for prod use in the future or "only for examples"

I think you have built something that could be very valuable, as long as we reuse the metrics calculation from another Java lib or Python code via the cross-language-runner.

No problem if this lib is just showing an example, but I'd like to know if your plan is "more serious" before I put more time in it. We're all busy with the normal job :-)

Nice weekend to all of you,

/Dan

rezarokni commented 3 years ago

We are still evaluating the future direction and do not have concrete plans to share at this time. However, as you may know, the open source code can be run on top of Beam runners, like the Dataflow runner which is in production. Any issues with the Dataflow runner, will be supported via support channels. Can you share information on what your need is? Are you looking to contribute or are you a Beam user that is looking to use this library in production?

Once the samples are stable, we do hope to ask the Apache Beam community for interest in moving it to extensions under Apache Beam. But that will of course be up to the Beam community to assess if that is a good fit / desirable.

rezarokni commented 3 years ago

PS: Looking at the VWAP there are a couple of other bits that I am just now adding to the core, which would be needed for the computation. For example knowing which of the composite values are gap filled. Given this has had a few bits added to core, I am happy to carry out the rest of the development for the VWAP metric as it looks like its one which would take more effort than originally expected. Going through these exercise really helps with figuring out whats missing from the core!

danp11 commented 3 years ago

Hi, yes I understand fully that there are no concrete plans for this lib at the moment.

My use case is that I'm having a student doing his master thesis at our company and I
wanted to see if this lib, with more metrics included, could be useful in detecting financial fraud. Metrics can sometimes give an indicator in this area and also be helpful for ML.

It's probably a bit early to use this lib for his thesis (I have recommended a Spring Boot/Kafka/Beam) solution for his job. Nevertheless, I'm always happy to code and contribute if you/the community decides to take "step two" in developing this lib.

Thanks,

/Dan

danp11 commented 3 years ago

A suggestion is also to use double, instead of BigDecimal with rounding, in the vwap calculation.

Good day

rezarokni commented 3 years ago

@dsdinter with regards to the timestamp ask, the test frame work supports timestamps via ADVANCE_WATERMARK_SECONDS I think. It would be good for it to also support event timestamps outside of Time Stream watermark, this should allow for things like late data. Unless I misunderstood the option?

dsdinter commented 3 years ago

Right, one pending feature in our TODO list is the ability to pass hints about processing time, to test late data and triggers, using a hint in the json like "advance_processing_seconds". In whatever case is still possible to implement the test directly using the available method advanceProcessingTime in the TestStream class as per https://beam.apache.org/blog/test-stream/