coolwanglu / pdf2htmlEX

Convert PDF to HTML without losing text or format.
http://coolwanglu.github.com/pdf2htmlEX/
Other
10.38k stars 1.84k forks source link

Slowness on mobile devices #30

Open lalith-b opened 12 years ago

lalith-b commented 12 years ago

I saw the original document on the safari browser on my iPad and then I tried to see the html version. its very difficult/glitchy to scroll the document (Not as smooth as the pdf rendering engine in iOS.) Can it be made more smoother with zoom controls on pinch up and pinch down ?

iapain commented 12 years ago

Try loading pages one by one in dom via ajax. It works better.

coolwanglu commented 12 years ago

It's a known issue, optimization for mobile devices will be done later, and, as @iapain said, might be a work of the frontend.

lalith-b commented 12 years ago

This alone can make this project as a standalone super feature. Most of the pdf2html converters dont support mobile devices and most of mobile devices has inbuilt javascript based rendering mechanism which screws up the UI. It is really a nice to have feature if there's mobile support for the html's like a command line arg--

pdf2htmlEx somefile.pdf --ipad pdf2htmlEx somefile.pdf --iphone3.5inch pdf2htmlEx somefile.pdf --iphone5

what do you guys think about this ?

coolwanglu commented 12 years ago

Can you elaborate what should be done, for example, especially for iphone5 ?

iapain commented 12 years ago

Pardon me, what these options going to do? Problem is that PDF is usually fixed layout document and text won't re-flow on mobile devices, that is the core problem on small screen devices.

However, I am in a favour of adding support for preset files. e.g. ipad.preset which would contain something like this --zoom 1.33 --space-as-offset 1 --tounicode -1.

coolwanglu commented 12 years ago

Yes, somebody asked me about this before.

I've not found a trivial way to make GetOpt accepting a file as supplementary parameters.

So maybe I need parse some format myself.

I'll use JSON if I'm writing in Python, but how about C++?

Anyway it'll be a 'user-experience' feature, but not a 'technical' features, which is far less interesting to me...

Speaking of mobile devices, will it be faster if images are not included, say --process-nontext 0. Or maybe it will improve loading time only?

On Wed, Oct 3, 2012 at 6:58 PM, Deepak notifications@github.com wrote:

Pardon me, what these options going to do? Problem is that PDF is usually fixed layout document and text won't re-flow on mobile devices, that is the core problem on small screen devices.

However, I am in a favour of adding support for preset files. e.g. ipad.preset which would contain something like this --zoom 1.33 --space-as-offset 1 --tounicode -1.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-9102427.

coolwanglu commented 12 years ago

Another obstacle is that I've got no smart phones. So no way for me to optimize/test/debug.

coolwanglu commented 12 years ago

@iapain btw, --fit-width might be better for you

lalith-b commented 12 years ago

for presets basically which has ipad/iphone may display html content with no grey space and the page zoomed to the particular size (ipad 1024x768 for landscape and 768x1024 for portrait in viewport setting in meta tag and for iPhone <iPhone 5 will have 320x480 where as iPhone5 will have 640x1136).

for scrolling iscroll js can be used to make it smoother - http://cubiq.org/iscroll

simple tweaks which make this a better for use on all platforms. I hope i'm making clear of the objective and not deviating.

@coolwanglu you can use some online simulations like http://quirktools.com/screenfly/ which will help you to view on simulators, its very close to a real device but its never = to a real device.

lalith-b commented 12 years ago

yeah I tried --fit-width but the issue arised when turning the device to landscape and portrait mode. The grey spaces will show up when turning the devices.

coolwanglu commented 12 years ago

Still not clear withe the viewports.

What do you expect when you rotate your device?

On Wed, Oct 3, 2012 at 7:53 PM, Lalith B notifications@github.com wrote:

yeah I tried --fit-width but the issue arised when turning the device to landscape and portrait mode. The grey spaces will show up when turning the devices.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-9103599.

lalith-b commented 12 years ago

the pages when rotating the device must adjust to fit the screen's new width after rotation.

coolwanglu commented 12 years ago

I see, that sounds like a UI feature to me

On Wed, Oct 3, 2012 at 7:59 PM, Lalith B notifications@github.com wrote:

the pages when rotating the device must adjust to fit the screen's new width after rotation.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-9103720.

iapain commented 12 years ago

@coolwanglu IMO single line key-value would be ideal for preset file e.g.

f=1
l=1
output-dir=~/Documents
space-as-offset=1

I can patch this if you want. I still don’t see it as top priority work.

Regarding Mobile devices @coolwanglu problem is not exactly with width and viewports. It's more related to memory optimization in iOS which degrades some features when memory warning is received. There are quite a few tricks to make it more responsive as mentioned here.

http://stackoverflow.com/questions/9941972/slow-list-view-scrolling-on-ipad-when-scrolling-in-an-overflowauto-div

@deathlord87 Based on my tests on iPad. I have found that scrolling is very laggy, rest all works with proper zoom option.

lalith-b commented 12 years ago

yes the scrolling is slow only because of the javascript. Try removing the jquery in manifest and run on iPad. Its better than the app with the jquery. i really dont understand what the javascript does.

About memory warnings. It's beacause of the scrollable div area as I've read. Thats why i suggested iscroll dont know if that will work either, its all about performance.

@iapain also try compiling the sources with loading the html with the following change in manifest.

<!body> <!div id="pdf-main">

to

<!body id="pdf-main">

It loads a bit better. I will run performance tests with instruments in mac and give some ideas of memory peaks.

coolwanglu commented 12 years ago

the format is ok to me, maybe need a new argument somehing like --cfg-file , which read

and process as if --cfg-file is replaced by the content in the file. There's another concern, Right now there's a argument --data-dir with which you can provide customized templates, js files etc. Maybe --cfg-file is a little bit overlapping with it ? Not sure. On Wed, Oct 3, 2012 at 8:16 PM, Deepak notifications@github.com wrote: > @coolwanglu https://github.com/coolwanglu IMO single line key-value > would be ideal for preset file e.g. > > f=1 > l=1 > output-dir=~/Documents > space-as-offset=1 > > I can patch this if you want. I still don’t see it as top priority work. > > Regarding Mobile devices @coolwanglu https://github.com/coolwangluproblem is not exactly with width and viewports. It's more related to > memory optimization in iOS which degrades some features when memory warning > is received. There are quite a few tricks to make it more responsive as > mentioned here. > > http://stackoverflow.com/questions/9941972/slow-list-view-scrolling-on-ipad-when-scrolling-in-an-overflowauto-div > > @deathlord87 https://github.com/deathlord87 Based on my tests on iPad. > I have found that scrolling is very laggy, rest all works with proper zoom > option. > > — > Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-9104122.
coolwanglu commented 12 years ago

jquery is for selective rendering(pages outside the screen will be hidden), handling links (without js you can only arrive at the page of the destination, instead of the precise box)

I think server support is necessary for the optimization you mentioned.

On Wed, Oct 3, 2012 at 8:43 PM, Lalith B notifications@github.com wrote:

yes the scrolling is slow only because of the javascript. Try removing the jquery in manifest and run on iPad. Its better than the with the jquery. i really dont understand what the javascript does.

About memory warnings. It's beacause of the scrollable div area as I've read. Thats why i suggested iscroll dont know if that will work either, its all about performance.

@iapain https://github.com/iapain also try compiling the sources with loading the html with the following change in manifest.

to

It loads a bit better. I will run performance tests with instruments in mac and give some ideas of memory peaks.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-9104747.

iapain commented 12 years ago

@deathlord87 It would be nice if you can benchmark iOS safari performance with and without hardware acceleration. You can trigger hardware acceleration simply by doing some graphic stuff in css.

yocontra commented 12 years ago
Erics-MacBook-Pro:Desktop contra$ pdf2htmlEX BG_JulyAugust2012-final.pdf --fit-width
pdf2htmlEX: unrecognized option `--fit-width'
coolwanglu commented 12 years ago

@Contra which version are you using? If you still see this error after updating to the latest master branch, please file a new issue.

yocontra commented 12 years ago

@coolwanglu - I used the brew formula in the README

coolwanglu commented 12 years ago

That's not the latest version.

On Friday, October 5, 2012, Eric Schoffstall wrote:

@coolwanglu https://github.com/coolwanglu - I used the brew formula in the README

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-9166010.

yocontra commented 12 years ago

This should work right? http://pastebin.com/raw.php?i=iTnu2m40

coolwanglu commented 12 years ago

Yes.

On Fri, Oct 5, 2012 at 1:41 PM, Eric Schoffstall notifications@github.comwrote:

This should work right? http://pastebin.com/raw.php?i=iTnu2m40

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-9166049.

lalith-b commented 12 years ago

Hey everyone, As @iapain as asked i've benchmarked the ios safari and profiled it. The memory peaks as and when we scroll the pages. Safari crashes/freezes when it reaches 14-18MB mark.

When used in UIWebView (Safari View for app development) it crashes at 10+ MB loaded. The memory needs to be handled efficiently so that the scrolling is not laggy. also the page which is visible + 1 back and +1 forward pages alone should be rendered all the other pages need not be rendered. I think too many

elements are the reason for crashing.

Currently when scrolling through the PDF the memory is loaded with the Full PDF and as we scroll the pages on the top is still live in memory, though Its not visible to the user.

I tried with issue65_en.pdf - http://dl.fullcirclemagazine.org/issue65_en.pdf which is loaded as an example. When I reach page 6-10 Its crashing.

iapain commented 12 years ago

@deathlord87 Thanks for your investigation. I think the only way to make work inside Safari is to write your own viewer. pdf2htmlEx does support split output.

Regarding memory leaks, It might be safari bugs. I also tried to optimize and what I out that loading one page one time helps. I am not sure if we should report it to webkit. I need to further investigate about what are the limits and if hardware acceleration helps.

BTW 10+ MB on scrolling is huge, @coolwanglu may it worth exploring flowable text.

coolwanglu commented 12 years ago

@deathlord87 Thanks a lot for your effort! You are right about the memory issue, actually there's similar things done with Javascript, where the invisible pages are hidden, for smoother scrolling on PC. I'm actually trying to find a way to reduce the number of elements, but 'to not render unnecessary pages' is the job of viewers, and pdf2htmlEX provides only content conversion. Btw, did you mean PDF or converted HTML?

@iapain issue65_en.pdf itself is 18M, and the size of converted HTML is similar, so 10+ MB did not surprise me. flowable text is in the TODO list, I will probably start working on it after finishing css/svg. How much speed gain do you think can be brought by flowable text?

lalith-b commented 12 years ago

@coolwanglu the 18M is the size of the full HTML, its not necessary for the Devices be it mobile/desktop to eat up 18+ MB's of RAM for just viewing 1 page at a time. And yes! ,you are right the purpose of the project is to enable conversion of content, and 8/10 the conversion happens properly thanks to all your effort. There's much more great things that this project is capable of, I am writing a server side code(Node.js/Shell.js) to enable users to upload pdf's and get converted html as downloads, will open a repo soon.

iapain commented 12 years ago

@coolwanglu Flowable text can dramatically ease Safari rendering problems. On the top of that the result output can fit in any screen which is equally important.

coolwanglu commented 12 years ago

After a quick searching, it does not seem to be as easy as I've thought of, there are many rules related to grammar, space, positions. So I'll take some time for this after finishing the background staffs.

lalith-b commented 12 years ago

hey guys, I've written a server on Node/Shell.js for pdf2htmlEX for uploading documents and converting it on the fly. Check out my repo at https://github.com/deathlord87/pdf2htmlEX_server

Soon I shall upload the mobile app which I'm working on.

coolwanglu commented 12 years ago

@deathlord87 Thanks for your effort.

iapain commented 12 years ago

@deathlord87 Looks decent.

lalith-b commented 12 years ago

will fix some ui this weekend guys. may be secha touch ? what you all suggest ??

coolwanglu commented 12 years ago

@deathlord87 What do you mean by UI?

pdf2htmlEX should extract and provide enough information from the PDF file, such that UI interface can be implemented on top of it, for example, zoom/link/outline/thumbnails etc. pdf2htmlEX should provide no UI, because I assume that publisher is likely to design their own UI theme in order to to match their websites.

However there should be a demo UI, such that others may understand the structure of generated HTML better, you can actually find javascript script code in the repo. Currently what it does are:

  • hide necessary pages ( set display:none )
  • handle internal link jumps
  • handle zoom in/out (disabled currently)

what's- missing for a demo UI include

  • UI elements
    • buttons
    • navbar
  • corresponding functions for UI elements above for the Viewer object
  • effects

In principle, the Viewer object should be out-of-the-box providing necessary function, and UI designers may directly use them.

I suggest jQuery, because it's being used in the repo. Also there's a mobile version.

However as pdf2htmlEX is for general platforms, so I don't suggest use a mobile version directly.

coolwanglu commented 12 years ago

@deathlord87 I'm not familiar with Sencha Mobile, but I've used ExtJS, which is not lightweight enough to me.

lalith-b commented 12 years ago

hmmm...!! I just wanted to create a better interface for people to upload thier pdf files and immidiately get their responses as page by page html with prev and next buttons. I'm currently struck as I dont have any idea of client side javascripts/jquery stuff. I am an OO Guy :(

coolwanglu commented 12 years ago

@deathlord87 sorry for the misunderstanding. Still recommend jQuery in this case.

coolwanglu commented 11 years ago

I've tested a few HTML pages produced by pdf2htmlEX, on a few low-level Android devices. The results vary, depending on the browsers. Also it's not recommended to open the produced HTML files directly, since usually web browsers do not optimize in the same way as PDF readers.

But probably there will be ajax loading in the pdf2htmlEX.js

iapain commented 11 years ago

We were able to increase performance by triggering hardware acceleration. I have tested it with iOS device but in theory it should work same way on Android Browsers as well. I have heard some browsers on Android like Dolphin they enable hardware acceleration by default so that might be the reason why you observed different results.

You can try to trigger hardware acceleration by a 3D transformation in CSS. You may also wants to look at Zapto instead of using jQuery. Zapto is minimal jQuery clone.

coolwanglu commented 11 years ago

Can it decrease performance in any cases?

On Sat, Mar 9, 2013 at 9:32 PM, Deepak notifications@github.com wrote:

We were able to increase performance by triggering hardware acceleration. I have tested it with iOS device but in theory it should work same way on Android Browsers as well. I have heard some browsers on Android like Dolphin they enable hardware acceleration by default so that might be the reason why you observed different results.

You can try to trigger hardware acceleration by a 3D transformation in CSS.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-14663045 .

jahewson commented 11 years ago

Hardware acceleration is not a panacea. Mobile devices have a limited memory devoted to the GPU, and once you fill this, performance degrades or the browser crashes. Mixing non-accelerated with accelerated elements can cause rendering problems. Hardware acceleration tends to be to the detriment of font anti-aliasing, but this varies a lot between browsers and platforms.

So yes, it's worth looking at, but don't assume you can hardware-accelerate a 10MB+ DOM

iapain commented 11 years ago

@coolwanglu If you mean Zapto then I am pretty sure it wont. In general @jahewson is correct about limited memory assigned to GPU but I think split-pages option makes it worth trying, on mobile device we can render pages from split pages instead of huge DOM.

It's out of scope of pdf2htmlEX to implement a specialized reader for mobile but at least we can have kind of optimized css which hardware accelerates.

coolwanglu commented 11 years ago

@iapain What's Zapto please? @jahewson Of course it's better to be slow than crashing.

So if there is no panacea, at least I can write something in the Wiki for tips.

Speaking of huge size of DOM, I've been worrying about the default Magazine demo...Maybe I'll implementation some ajax loading there.

@jahewson Can I have your current email address please? The one on your github profile seems down.

iapain commented 11 years ago

@coolwanglu Sorry I meant Zepto. Its minimal jQuery implementation. It's quite faster than jquery itself. We use it heavily on mobile HTML.

coolwanglu commented 11 years ago

I see, thanks for the tip!

On Mon, Mar 11, 2013 at 3:52 PM, Deepak notifications@github.com wrote:

@coolwanglu https://github.com/coolwanglu Sorry I meant Zeptohttp://zeptojs.com/. Its minimal jQuery implementation. It's quite faster than jquery itself. We use it heavily on mobile HTML.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/30#issuecomment-14700901 .

Toneti777 commented 11 years ago

Hello, how do you render the pages after transform the pdf with --split-pages??

Do you have any example?

I try it and include in the new html page de link to the css and make accesible the images... after include a code of one .page in a div of that page..

In the view I see the image first and after the text without format..

I will apreciate some help... thanks..

coolwanglu commented 11 years ago

@Toneti777 This is off-topic(for this issue), please use the mailing list, for file a new issue, such that others will not be bothered.

coolwanglu commented 11 years ago

While the demo pages are still slow on my iPad 1G, they are not on Android phones of my friends'.

I wonder if anyone would like to confirm if they are slow on iPad >=3 or other recent mobile devices, if not, maybe the devices are already powerful enough.

Lazy page loading has been implemented (thanks to @micred), which should also improve the performance.

coolwanglu commented 11 years ago

While the demo pages are still slow on my iPad 1G, they are not on Android phones of my friends'.

I wonder if anyone would like to confirm if they are slow on iPad >=3 or other recent mobile devices, if not, maybe the devices are already powerful enough.

Lazy page loading has been implemented (thanks to @micred), which should also improve the performance.