google-research / pix2struct

Apache License 2.0
604 stars 54 forks source link

Scripts for preprocessing the C4 dataset #24

Open ShengdingHu opened 1 year ago

ShengdingHu commented 1 year ago

Thanks for your amazing work! May I ask could you please share the preprocessing code for converting the textual C4 dataset into image - code pair for pre-training the model?