Translation of local files

rowds commented 2 months ago

Translation on local files should absolutely beautiful possible with slightly modifying the existing script I think. But it would be great to have the option to translate a local PDF as well if it's loaded into the browser, skipping the whole https request part in that scenario. Many websites doesn't support accessing manga images via post request and actively blocks it. In those cases we can simply download the images via different script and feed them to the models individually, in a batch or PDF.

Crivella commented 2 months ago

Translation on local files should absolutely beautiful possible with slightly modifying the existing script

It is technically already possible to translate files. One patchworky solution is what i've described in this reply on reddit where you can run a small webserver showing all the images in a folder.

Another possibility would be to have a separate python script reading images from a folder and sending the request to the server and than using the response to draw the textboxes on the images and save them. This is beyond the scope of this tool/repo and if ever done will be probably done in a separate repository.\ I don't plan to do this anytime soon (if you are interested you could make a PR to this repo or create a repo off your own and i can add a link to it in the docs/readme)

But it would be great to have the option to translate a local PDF as well if it's loaded into the browser

As far as PDFs goes this would either have to be done on the side of the extension which right now is not really capable of detecting images in a PDF. In general i think that PDF translation is a beast of its own, where most people would be interested more in the translation of the text. Having dedicated translation of images inside of a PDF might be a secondary problem (it would probably be easier to extract the images and translate them separately)

It might be worth for the future to add a translate_pdf endpoint if this ever goes beyond a tool mostly dedicated for images

Many websites doesn't support accessing manga images via post request and actively blocks it

The POST requests are not done to the image hosting site, but from the browser extension to the ocr_translate server. As long as the images are loaded in your browser as either an img or canvas element, they will be picked up and sent to the server (https://github.com/Crivella/ocr_extension/blob/ac59816e1230b236dacf06fcb4833dd366dda647/src/content.js#L246).

The thing that could be improved right now is that the extension does not take into account all possible JS that could inject images into the browser and can in some cases stop images from loading. EG for https://ac.qq.com i've seen that only the images that have been loaded by scrolling gets processed while the others are stopped from loading. Again this in an extension problem (probably should open an issue about it there) and contributions are welcome.

rowds commented 2 months ago

Sorry, I thought that the app was getting images from the website via post requests.

The thing that could be improved right now is that the extension does not take into account all possible JS that could inject images into the browser and can in some cases stop images from loading.

Exactly so, the site I am trying to acess does similar thing. Either the extension is already blocked and even if somhow reloading the website can get the extension to try to get the images, the website just becomes completely blank. I have even tried disabling right click protection, but nothing worked. Will open an issue on the extension repo.

Apart from this , the project is really good! I am thinking of building a manga translator myself. On much smaller scale though. I've come across a interesting paper propsing the use of multimodal context-aware translation framework, that will not translate based on only text but will also take the image context into account!

Crivella commented 2 months ago

Exactly so, the site I am trying to acess does similar thing. ... Will open an issue on the extension repo.

If you can point me to the site i will try to have a look into it

I've come across a interesting paper propsing the use of multimodal context-aware translation framework, that will not translate based on only text but will also take the image context into account!

That seems interesting, in my todo list i want to introduce context aware translation by extracting context using something like a CLIP model and then use a translation tool like an LLM model that can take context into account. If there are models that can do it all at once that would be even better.

The difficult thing that I do not think is trivial to do with the way this server work is to keep context in between images (i do not think there is a surefire way to know which images are tied together without introducing a batch translation)

Will move this to a discussion as it seems more appropriate

Crivella / ocr_translate

Translation of local files #36