apache / streampipes

Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
https://streampipes.apache.org
Apache License 2.0
560 stars 172 forks source link

[#1865]: create processor to parse date time strings #2207

Closed dshunter107 closed 5 months ago

dshunter107 commented 6 months ago

Closes #1865

This processor parses a DateTime string. It converts the string into a ZonedDateTime variable, and places the variable in the event variable. Tests are made to ensure that the code works properly.

I am still somewhat new, so any feedback is welcome.

Purpose

Remarks

PR introduces (a) breaking change(s): no

PR introduces (a) deprecation(s): no

dshunter107 commented 6 months ago

Okay. That clears up a few things for me. I have added the resource files.

Thanks for all the feedback. I definitely feel like I am getting better with each pull request for each project. Everyone has been giving great feedback and helping me learn. Thanks!!

bossenti commented 6 months ago

Okay. That clears up a few things for me. I have added the resource files.

Thanks for all the feedback. I definitely feel like I am getting better with each pull request for each project. Everyone has been giving great feedback and helping me learn. Thanks!!

Great to hear, hope you enjoy it 🙂 In case there is anything unclear to you, please never hesitate to reach out

bossenti commented 6 months ago

I have adapted the asset files a bit, hope you are okay with it 🙂

I just tested it locally and have some last remarks:

I think we need to sort them at least alphabetically. In addition, I would suggest to use a dropdown menu as input dialog (I can help you here, in case you are not sure about how to achieve) Lastly, is my understanding correct that the timezone parameter should be optional?

dshunter107 commented 6 months ago

I have adapted the asset files a bit, hope you are okay with it 🙂

I just tested it locally and have some last remarks:

  • You need to register your processor in the class TransformationExtensionModuleExport. Otherwise streampipes won't recognize it
  • Please align the id in ProcessingElementBuilders create() with the name of the resources directory
  • The timezone parameter has a lot of values: image

I think we need to sort them at least alphabetically. In addition, I would suggest to use a dropdown menu as input dialog (I can help you here, in case you are not sure about how to achieve) Lastly, is my understanding correct that the timezone parameter should be optional?

Regarding the Timezone Parameter. I would say the answer to your question is "somewhat"; however, I am not so sure that "optional" is the best word to describe it. It was a little challenging for me to initially wrap my head around java DateTime, so I apologize if the following is not explained well.

There are two types of objects when it comes to DateTimes: (1) LocalDateTime, and (2) ZonedDateTime. Like the name suggests, LocalDateTime has both a date and a time but has no information about the time zone that it corresponds to. As a result, it does not map to a specific instance in the real world. It could be any of the 24 hours corresponding to the times in the world or even any of the myriad of ways different countries and locales recognize their time.

The LocalDateTime variable leads to the problem discussed in the Github issue associated with this pull request: the LocalDateTime will correspond to the timezone of the current user. To expand on this problem further with an example, if data was being pulled at 3PM in Colorado and was being sent to two apachestreams users in California and New York, the user in California would get a reading of 3pm in California, and the user in New York would get a reading of 3PM in New York. All 3 readings (California, Colorado, and New York) would correspond to a different instance in time.

My solution to this was to use the other DateTime variable: ZonedDateTime. In order to place a ZonedDateTime in the event, the time zone must be a part of the ZonedDateTime. Now the California user (once the time zone is entered) would see a reading of 3pm in Colorado, and the New York user would also see a reading of 3pm in Colorado. All 3 locations would correspond to the same instant. With that problem fixed, an interesting wrinkle occurs...

The ISODateTime standard could either be a string formatted for LocalDateTime or a string formatted for ZonedDateTime. In the event that the string is already formatted for ZonedDateTime, we would now have a potential for 2 separate time zones applying for the same datetime. One time zone would be from the user and one from the datetime string. In my solution, I made the assumption that the initial location would have the correct time zone in the string, so I ignored the user time zone in preference of the time zone for the datetime string. As a result, to describe my overall solution, I would say it was to make the time zone input mandatory but ignore it if the datetime string already has the time zone.

I think making the time zone input optional is definitely doable. The issues that would need to be addressed with that way are (1) if the user does not put in the time zone with a LocalDateTime string and (2) if the user puts in the time zone string with a ZonedDateTime string. One solution I thought of for issue 1 would be to have a default time zone like "US/Eastern". Let me know if you prefer the route for making the time zone optional.

As a side note: I am not aware of the relative frequency of LocalDateTime strings and ZonedDateTime strings. I made a guess that LocalDateTime strings would be more likely, so I created a user experience that would be the same if all of the strings were sent as ISODateTime standardized strings formatted as LocalDateTime strings.

bossenti commented 6 months ago

I believe I registered the processor. Let me know if I did that correctly. I aligned the Id with the resource directory I sorted the time zone parameter options. I may need you to point me in the direction of how to do the dropdown menus. Are you saying that there is already code that can easily transform the radio group into a dropdown list? Or are you saying that I would need to create the code to do so?

Yes, that's perfectly done now

Yes there should already be the an element to create a dropdown, I'll have a look

bossenti commented 6 months ago

Thank you very much for your detailed explanation and deliberate thoughts 🙏🏼

I would say it was to make the time zone input mandatory but ignore it if the datetime string already has the time zone.

With the context you provided, I'm totally fine with this solution 🙂 We only need to adapt the Documentation.md and the strings.en file since I described it as optional there 😅

bossenti commented 6 months ago

With respect to the dropdown menu, you can pass an additional boolean parameter to requiredValueSelection. If this is set to true we should get the expected outcome.

requiredSingleValueSelection(Labels.withId(SELECTED_INPUT_TIMEZONE),
            Options.from(getTimeZoneOptions()),
                true)

Can you please verify this?

Please don't ask why this parameter is called horizontalRendering Doesn't make any sense, imho 🙈

dshunter107 commented 6 months ago

I am unable to verify. I am unable to use the docker commands with CLI because none of the Pipeline elements show up. When I start the program with docker-compose up -d in the compose folder, I am able to see all of the pipeline elements except the DateTimeFromStringDataProcessor. I probably did something incorrectly.

I found the tutorial for DataProcessors at the following location: https://streampipes.apache.org/docs/extend-tutorial-data-processors/ I followed all the steps, but I still am not able to find the processor at all. How were you able to test it when you sent me the photos?

bossenti commented 6 months ago

Looks great!

image

However, I ran into an issue when testing the processing element end-to-end. This relates to the propagation of events between pipeline elements and is not directly caused by these processing elements. I'll try to find some time in the upcoming days to debug it - please bear with me 😊

bossenti commented 6 months ago

In case you want to continue here: one possible workaround here would be to pass the timestamp as long value including milliseconds instead of the local date time object diretly

dshunter107 commented 6 months ago

Take your time, No worries.

Sure, whatever works. I included the timestamp as a long with milliseconds in the event as well as the time zone that the user specified. I have a few clarifying questions:

(1) Did you mean to say the "Zoned Date Time" object? (2) Are you saying that I should delete the "Zoned Date Time" object for now? (3) If we have only the milliseconds, how would you like to handle different time zones? As of now, with only the milliseconds, if the data was taken in Florida at 4PM, a Streampipes user in California would have a reading for 1PM. This would result in the same instant, but a different time. Is that okay? I put the time zone provided by the user in the event as well, if we want to be able to specify the same instant but the time zone provided by the user. (4) Could this issue be related to the OutputStrategy that I selected:

.outputStrategy(OutputStrategies.append( EpProperties.timestampProperty(OUTPUT_DATETIME_RUNTIME_NAME)))

I ask because I am not sure if the timestampProperty was designed for ZonedDateTime.

bossenti commented 6 months ago

his would result in the same instant, but a different time. Is that okay?

It is not ideal, but I'm afraid there is no real alternative in StreamPipes so far.

(4) Could this issue be related to the OutputStrategy that I selected:

The output strategy is fine. Since you add the timezone as well, we should include this in the output strategy as well and adapt the documentation accordingly.

The only aspect I'm unsure about is whether having multiple timestamp properties is a problem. @tenthe @dominikriemer What do you think? Will this lead anywhere to problems?

dshunter107 commented 6 months ago

Okay. I added the output strategy. Let me know if there is anything you would like me to do.

bossenti commented 5 months ago

Great, thank you!

I'll give it a test in the upcoming days. Appreciate your patience.

dominikriemer commented 5 months ago

Thanks for your contribution @dshunter107! That a very useful pipeline element.

dshunter107 commented 5 months ago

Thanks for your contribution @dshunter107! That a very useful pipeline element.

You're welcome! Glad it works.