Closed ElPicador closed 8 years ago
Note to myself: Add search on gns3: https://secure.helpscout.net/conversation/154616922/4890/?folderId=696715
This looks like a pretty big enhancement, considering that the underlying engine we're using for scrapping (Scrapy) only do static HTML parsing. For SPA application, I would rather try to hit the API level if possible, or say that DocSearch is not compatible with their documentation.
For Prezly, they are using readme.io, maybe we could create something directly on readme.io level
I'd say it's more a feature than an enhancement.
I would personally go with an optional HTTP Proxy which can process JavaScript (PhantomJS / Selenium) documentations and feed the resulting static page into Scrapy / Python. What do you think about this approach?
There is Scrapy for JS: https://github.com/scrapinghub/scrapy-splash
@ElPicador very cool! As I can see it's basically what I said, just more handy and already Dockerized :smile: did you already give it a try?
Never, @redox was the one who told me about it
@ElPicador @proudlygeek @pixelastic We've been thinking of making it the onboarding project of @aseure :)
Awesomeness!!! :100: :+1:
I've opened a PR to address those problematic documentations. Please see https://github.com/algolia/documentation-scrapper/pull/46.
I think this can be closed
Some documentation are generated client side with JS (ex: http://docs.prezly.com/, https://gns3.com/support/docs/quick-start-guide-for-windows-us).
It would be nice to be able to parse them