fix parsing dataset input lines

emedvedev / attention-ocr

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

MIT License

1.08k stars 256 forks source link

fix parsing dataset input lines #101

Closed gammasts closed 6 years ago

gammasts commented 6 years ago

Fixes #100. This would restore original functionality before https://github.com/emedvedev/attention-ocr/commit/0cfeacb945f78e810ebac587ab06a67e7da9d752 and still allow any singular whitespace separator between the image path and the label.

The only downside to this method is it doesn't allow whitespace in image paths. My recommendation if this is desired is to switch the code back to only allowing tabs, or changing over to a json format.

emedvedev commented 6 years ago

Thanks a lot! No whitespace in image paths isn't ideal, but it's an acceptable tradeoff in this case, I think.