Closed cmkimerer closed 9 years ago
If you look at the html source of https://developers.facebook.com/docs/facebook-login/access-tokens, it seems like all the actual page text is commented out (inside html comments) instead of being normal text in the page. I'm guessing some client-side javascript runs on their page to render what you see on the screen after the initial page load.
So you would need to do your own custom processing to capture what the actual browser rendered after page load since the initial html that comes back from their servers doesn't actually include the page text. That's a special case specific to this website that is beyond anything that unfluff could support directly.
I was testing out unfluff on the url https://developers.facebook.com/docs/facebook-login/access-tokens and realized that no article text extraction is actually happening. It successfully pulled an image, description, and title, but the text appears blank.