Makepad-fr / fbjs

Tooling that automates your Facebook interactions.
https://www.npmjs.com/package/@makepad/fbjs
GNU General Public License v3.0
63 stars 24 forks source link

Switching to the desktop layout (done) #41

Closed iMrDJAi closed 3 years ago

iMrDJAi commented 3 years ago

The mobile layout of Facebook provides limited data and low quality media, because of that the project should switch entirely to the desktop layout.

Todo list:

kaanyagci commented 3 years ago

HI,

Thank you for your contribution. I changed the base branch from master to development as there are still things uncompleted. I'll try to take a closer look probably this weekend.

The markdown on post contents using another npm module could be an overkill also may be a maintainability problem. But we can implement something which removes all HTML tags around the String content not necessarily converting to a markdown input.

iMrDJAi commented 3 years ago

HI,

Thank you for your contribution. I changed the base branch from master to development as there are still things uncompleted. I'll try to take a closer look probably this weekend.

The markdown on post contents using another npm module could be an overkill also may be a maintainability problem. But we can implement something which removes all HTML tags around the String content not necessarily converting to a markdown input.

@kaanyagci The point of the markdown format is actually providing users a minimal output can be used to re-visualize posts content in the exact same way as the original, I only suggested that to avoid including the HTML format as it may be large and not human readable.
But I think you're right, we should reduce the number of the third party modules, also it's a good idea to let users handle the output by themselves, and we can provide them examples on how to do that.
In this case, we have to include the innerHTML along with the innerText in the output.

iMrDJAi commented 3 years ago

After some testing, I've noticed that when you start scraping without authentication, some posts won't provide the author profile url, in this case the selector group_post_author won't work.

image

Also, it's quite different how elements are being loaded in the desktop layout, in fact they won't until they show up on the viewport, and for that we should start scrolling before scraping.

iMrDJAi commented 3 years ago

@kaanyagci So yeah, I did it! The scraper works perfectly now with the new desktop layout of Facebook, and it has the same functionality as the one from the master branch. I think it's time you merge this to the development branch (after reviewing and testing it of course). Other features and new fields for the GroupPost interface should be added in a separate pull request to make it easier to organize things up!

kaanyagci commented 3 years ago

@All-Contributors please add @iMrDJAi for code

allcontributors[bot] commented 3 years ago

@kaanyagci

I've put up a pull request to add @iMrDJAi! :tada:

kaanyagci commented 3 years ago

@iMrDJAi This is excellent news! I was really busy with other stuff today. I'll test this first thing tomorrow! Great job! 💯

iMrDJAi commented 3 years ago

@kaanyagci Any updates? Have you tested it? Any issues?

kaanyagci commented 3 years ago

Sorry for the delay. I was still a little busy :( I'll look ASAP.

kaanyagci commented 3 years ago

Just checked. Sadly I can not get it to work.

async function main() { const f = await FB.init({ debug: true, output: 'test.json', headless: false, groupIds: ['774278349295443'], useCookies: true, disableAssets: true, }); f.login('', ''); await f.getGroupPosts(774278349295443, 'groupOutput'); }

main().then(() => { console.log('Done'); });

Gives the following output:
```sh
/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115
                    ? new Error(`${response.errorText} at ${url}`)
                      ^

Error: net::ERR_ABORTED at https://facebook.com
    at navigate (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async FrameManager.navigateFrame (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
    at async Frame.goto (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
    at async Page.goto (/Users/kaanyagci/Documents/makepad/fbjs/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:819:16)
    at async Facebook.login (/Users/kaanyagci/Documents/makepad/fbjs/dist/lib/models/fb.js:113:9)

Note: The output is the same for both headless and not headless modes.

I'll try to investigate these issues as soon as possible this week

iMrDJAi commented 3 years ago

@kaanyagci Interesting. in fact I haven't tried logging in, I been always testing in userless mode, I'll try that later and check what's going on.
For now you can try this:

;(async () => {

    const { FB } = require("@makepad/fbjs")

    const fb = await FB.init({
        headless: true,
        useCookies: false,
        output: ''
    })

    //await fb.getGroupPosts("319144912641926", "./output.json")

    await fb.getGroupPosts("319144912641926")

})()
iMrDJAi commented 3 years ago

The second error I've faced without login the web page used for group details is still the mobile page m.facebook.com

@kaanyagci That doesn't make sense, I'm 100% sure that I totally removed the mobile website. Fork my master branch again.

kaanyagci commented 3 years ago

My bad, I was trying on another branch 🤦

kaanyagci commented 3 years ago

This looks great actually. For the first issue, I've added the userAgent as Facebook rejects connections from headless browsers. I'll add this line once it's merged on development branch! Anyway great work @iMrDJAi !