LukasHedegaard / datasetops

Fluent dataset operations, compatible with your favorite libraries
https://datasetops.readthedocs.io
MIT License
10 stars 1 forks source link

Converting to_tensorflow with variable-shape elements #28

Open iliiliiliili opened 4 years ago

iliiliiliili commented 4 years ago

If dataset has different shapes of data of the same element, to_tensorflow dataset won't be able to return data.

LukasHedegaard commented 4 years ago

The issue with creating Tensorflow datasets, is that it needs a type_spec and and shape_spec. Currently, all elements are converted to a tf.Tensor (which needs same-size elements). Tensorflow does have a RaggedTensor type, though, which we might be able to use. The question then is how to identify if a RaggedTensor is needed:

1) The current implementation infers type_spec and and shape_spec by querying a single item, but if we want to do it without additional user effort, we would need to inspect multiple (how many?) items to determine if the type is variable-sized.

2) Alternatively, we could let the user pass a parameter to indicate which items should be converted to RaggedTensors (or SparseTensors for spare data).

I think we should opt for solution 2

iliiliiliili commented 4 years ago

Alternatively, we could let the user pass a parameter to indicate which items should be converted to RaggedTensors (or SparseTensors for spare data).

I think, it is better. If user doesn't know if they should use it or not, running the code will show is there an error or not. Then, they just add this parameter to convert.

In scenario 1 we can't be sure if we don't traverse through all data, and it's costly.

LukasHedegaard commented 4 years ago

Postpone until after v0.1.0