At the moment, the ExperimentData object consists of input_data, output_data, jobs, and domain. These are all custom objects that are private (except the Domain) object:
domain: f3dasm.design.Domain (public!)
input_data: f3dasm._src.design._data._Data
output_data: f3dasm._src.design._data._Data
jobs: f3dasm._src.design._jobqueue._JobQueue
Focussing on the input_data, any data (e.g. pd.DataFrame, numpy array, csv-file) that is given to ExperimentData will be converted to the _Data object. The _Data object back-end is pandas. This means that internally the data will be casted to something that is compatible with pandas datastorage; numpy
For automated differentiation tools this might be problematic, since the gradient needs to be 'tracked'. Any casting to numpy will break the chain.
In v1.4.3, we are using autograd.numpy to track these gradients and for tensorflow optimizers a conversion function will provide the 'custom gradient' so that it works with casting to numpy.
Additionally, optimized libraries will experience overhead costs when doing this conversion back an forth between e.g. jax arrays and numpy arrays
Proposal
Because the ExperimentData object is only depending on _Data and not directly on a pandas DataFrame, we can create a variant of the _Data object for any underlying datatype (e.g. a dictionary of tensorflow tensors).
We need to implement all the methods of the _Data object for that particular datatype.
Then, the user can choose upon creation of the ExperimentData object if they want to use the 'normal' backend (e.g. pandas/numpy) or any specialized backend (e.g. tensorflow, pytorch, jax).
This could also be inferred automatically when providing initial input_data.
First steps
This issue will investigate if we can implement this by starting with a _Data variant that works with an jax dataformat.
The problem
At the moment, the
ExperimentData
object consists ofinput_data
,output_data
,jobs
, anddomain
. These are all custom objects that are private (except theDomain
) object:domain
: f3dasm.design.Domain (public!)input_data
: f3dasm._src.design._data._Dataoutput_data
: f3dasm._src.design._data._Datajobs
: f3dasm._src.design._jobqueue._JobQueueFocussing on the
input_data
, any data (e.g. pd.DataFrame, numpy array, csv-file) that is given toExperimentData
will be converted to the_Data
object. The_Data
object back-end is pandas. This means that internally the data will be casted to something that is compatible with pandas datastorage; numpyFor automated differentiation tools this might be problematic, since the gradient needs to be 'tracked'. Any casting to numpy will break the chain.
Additionally, optimized libraries will experience overhead costs when doing this conversion back an forth between e.g. jax arrays and numpy arrays
Proposal
Because the ExperimentData object is only depending on
_Data
and not directly on a pandas DataFrame, we can create a variant of the_Data
object for any underlying datatype (e.g. a dictionary of tensorflow tensors). We need to implement all the methods of the_Data
object for that particular datatype.Then, the user can choose upon creation of the
ExperimentData
object if they want to use the 'normal' backend (e.g. pandas/numpy) or any specialized backend (e.g. tensorflow, pytorch, jax).First steps
This issue will investigate if we can implement this by starting with a
_Data
variant that works with an jax dataformat.