Open shabie opened 3 years ago
Well as an update, I was able to move forward by adding the following line as the first to the proces_image func:
raw_image = tf.squeeze(raw_image)
I skipped the label processing for now to see how far I get but end up with yet another error:
InvalidArgumentError: contents must be scalar, got shape [2]
[[{{node transform/DecodeJpeg}}]]
I believe that is because preprocessing_fn
recieves a batch of images, but process_image
expects a single image encoded in a byte string (see decode_jpeg
docs). You can try this: img_preprocessed=tf.map_fn(process_image, image_raw, dtype=tf.float32)
@festeh, tried that but this gives the following error:
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("DeserializeSparse:0", shape=(None, 1), dtype=int64), values=Tensor("DeserializeSparse:1", shape=(None,), dtype=string), dense_shape=Tensor("DeserializeSparse:2", shape=(1,), dtype=int64)). Consider casting elements to a supported type.
What the starting point for your transformation? a byte string? Do you load the byte string in your ExampleGen component?
You can call the process_image function as follows:
fn = lambda image: process_image(image)
img_preprocessed = tf.map_fn(fn, inputs['images_raw'], dtype=tf.float32)
Please note the preprocessing_fn
gets a batch of records (e.g. images). The transform function needs to handle this. In the particular case, decode_jpeg
can't handle batches. Therefore, we'll need to call the function via the lambda + map_fn
calls.
Please close the issue if it solves your issue. Thank you!
Hi all.
Found the issue.
True
, otherwise a SparseTensor is created when loading in with the Transform component:schema_gen = SchemaGen(
statistics=statistics_gen.outputs['statistics'],
infer_feature_shape=True)
map_fn
function needs to be called in preprocessing_fn
raw_image = tf.reshape(raw_image, [])
I've created a PR that should provide you with a working end-to-end example:
https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/pull/46
@shabie were you able to run this part of the beam pipeline as defined under Chapter 5: Standalone Execution of TFT for the Computer Vision Problem?
import tempfile
import tensorflow_transform.beam.impl as tft_beam
with beam.Pipeline() as pipeline:
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
tfrecord_file = "/your/tf_records_file.tfrecord"
raw_data = (
pipeline | beam.io.ReadFromTFRecord(tfrecord_file))
transformed_dataset, transform_fn = (
(raw_data, raw_data_metadata) | tft_beam.AnalyzeAndTransformDataset(
preprocessing_fn))
I keep running into the following error:
TypeError: byte indices must be integers or slices, not str
The book provides code snippets for the computer vision problem set but it seems to be not working for the transform. I mean specifcally the following code:
I am using it as follows in the
preprocessing_fn
:This is being called in the
Transform
step of the pipeline:The TFRecordDataset is a two-feature dataset one containing the raw (JPEG) image and other one contains the label as string (stored also as bytes). It was generated using pretty much the same code shown earlier in the book under the Data Ingestion chapter.
When I run the above, I get the following traceback: