eto-ai / rikai

Parquet-based ML data format optimized for working with unstructured data
https://rikai.readthedocs.io/en/latest/
Apache License 2.0
136 stars 19 forks source link

UDT: VideoTimestamp #689

Open da-liii opened 2 years ago

da-liii commented 2 years ago

https://github.com/eto-ai/spark-video/issues/26

scala> Int.MaxValue / (1000 * 60 * 60)
val res5: Int = 596
scala> Long.MaxValue / (1000 * 60 * 60)
val res7: Long = 2562047788015

We need a UDT (VideoTimestamp) to represent the timestamp used to locale the video frame.

changhiskhan commented 2 years ago

How is video timestamp different from regular timestamps?

Also one functionality that would be interesting for videos is to use the extracted video metadata to make it easy to switch between timestamps and frame numbers. When it comes to timestamps videos also have the concept of time basis which is very weird to deal with as an ML eng

da-liii commented 2 years ago

Timestamp/DataTime is bound to time zone. Using a UDT with LongType as the impl will help us avoid type conversion in PySpark/Arrow/Spark.