Closed sfalquier closed 4 years ago
We shall remove the dynamicColumns from the DataModel. To go on checking that some columns are double (as it was done in DataFrameFormatter.withValidDynamicColumnsType) I propose to add a second parameter to ArlasTransformer:
abstract class ArlasTransformer(val requiredCols: Vector[String] = Vector.empty,
val doubleCols: Vector[String] = Vector.empty)
The checkSchema()
method would be responsible of checking that some columns are indeed Double. It means that the customer application is now in charge of providing the good format (ex: replace "," with "." in some string columns to convert it to Double), each transformer checking that Double columns are as expected.
Thus DataFrameFormatter
should check that the DataModel columns are well formatted (lat/lon are double, timestamp is long).
@sfalquier are you OK with it?
Since this doubleCols
would be used by only few transformers, no need to have it in the parent class. If there is a need to factorize this check, provide a tooling method in io.arlas.data.utils
package.
OK. What about validating that DataModel columns are at expected type (like we used to do by checking that lat/lon were double)? Or do we suppose that the customer' application already did it?
DataModel
may only contain :idColumn
timestampColumn
timeFormat
latColumn
lonColumn
All other existing fields must be removed and given as argument(s) for methods/transformers that needs them.