Open davors72 opened 2 years ago
Hi @davors72,
Can you provide an example file in this format with a self-contained repro script so we can run this on our end?
Likely, that is not supported by DALI as it uses a different schema than DALI currently supports.
In the meantime, you can try out the external_source
operator in parallel mode and utilize https://github.com/vahidk/tfrecord.
Hi at @JanuszL, if you have the google cloud downloader you can download a sample from the dataset i was looking at with this command:
gsutil cp gs://objectron/v1/sequences/book/book_train-01200-of-01324 .
Trying to load that record with the above command should produce the error
Or the instructions here for the sequential version https://github.com/google-research-datasets/Objectron
Hi @davors72,
The schema that this data set implement is just not supported by DALI. As I have mentioned, it would be best to use an external source operator. Also, we would be more than happy to accept any PR that would extend the TFRecord reader by this schema.
Hi,
Running into an issue with DALI. I'm working with a dataset stored in the format of https://www.tensorflow.org/api_docs/python/tf/train/SequenceExample. Other tfrecord readers handle it fine, such as https://github.com/vahidk/tfrecord.
The error is on reading the index file
Assert on "p != nullptr" failed: Error reading from a file {FILE}
the file is valid however the index produced is somewhat odd in that it is just a single line of0 152207822
despite this being many records. The indexer in the above tool produces the same result but can still load it fine.The failing part:
DALI version comes from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch using tag
22.03-py3
Any advice? Are these
tf.SequenceExample
just not supported in DALI? Are they on a roadmap?FYI: The PyTorch loader above succeeds but takes in a separate argument for sequential features.
tfrecord.tfrecord_loader(path, None, sequence_description={"image/encoded": "byte"})
Thanks!