Closed EZBUTD closed 1 year ago
Ah, it looks like something is indeed going strangely with sorting, where I think the default sort is putting files in the order of 1,10,100,... for some reason:
Removing the volume number and then changing the first sort on ~line 108 to: folder.append(pages.sort_by{|s| s.scan(/\d+/).first.to_i})
seems to have fixed it for me. Not sure if it's a strange edge case that I've run into or if I structured something wrong on my end.
Dealing with filenames is really complicated, nothing wrong on your end just something I didn't account for. I've updated the repo and now it should hopefully sort it properly, I should've updated this back when Mokuro had a similar issue but I ended up forgetting about it.
Sorry for the late response, been pretty busy with work lately.
Thanks so much for taking a look and building this! I'll comment again if the issue comes back up.
I followed the steps in the collab notebook, and was able to get OCR json outputs for the images. However, the pages appear to be out of order. Not sure if something is wrong w/ the set up on my side, I took a look but could not find anything.
Any help is appreciated! The copy of my notebook with relevant files is: https://colab.research.google.com/drive/1_GoQtWC0JJWzWSpecaWmLcHwC9NeIWOH#scrollTo=pvm9Fggi2lUz