CheshireCaat / puppeteer-with-fingerprints

Anonymous automation via puppeteer with fingerprint replacement technology.
MIT License
273 stars 34 forks source link

Cannot run puppeteer with fingerprint using node thread worker #6

Closed imnobody221 closed 1 year ago

imnobody221 commented 1 year ago

try to run 2 thread worker using same puppeteer script but one of browser show output error : 1) "Error: Lock is not acquired/owned by you" 2) "Unable to start engine process (code: 3221225477)"

bellow is example code to produce this error. example.js :

import { plugin } from 'puppeteer-with-fingerprints'
;(async () => {
    // Launch the browser instance:
    const browser = await plugin.launch({ headless: false })

    // The rest of the code is the same as for a standard `puppeteer` library:
    const page = await browser.newPage()
    await page.goto('https://example.com')

    // Print the browser viewport size:
    console.log(
        'Viewport:',
        await page.evaluate(() => ({
            deviceScaleFactor: window.devicePixelRatio,
            width: document.documentElement.clientWidth,
            height: document.documentElement.clientHeight,
        }))
    )

    await browser.close()
})()

code to run new thread worker: thread.js:

import { Worker, isMainThread } from 'worker_threads'

// Instantiate a Mocha instance.
var scriptpath = './src/scripts/example.js'

if (isMainThread) {
    for (let i = 0; i < 2; i++) {
        // workerPath not scriptpath
        let worker = new Worker(scriptpath)

        worker.on('message', msg => {
            console.log('worker response: ' + msg);
        })
        worker.on('exit', (code) => {
            console.log(`Script run stopped with exit code ${code}`)
        })
    }
}

Error log produce: After first thread run. i believe this error code for 2 thread worker

Viewport: { deviceScaleFactor: 1, width: 929, height: 879 } >>>> log for first thread

node:internal/event_target:1010
   process.nextTick(() => { throw err; });
                           ^
Error: Unable to start engine process (code: 3221225477)

This could be due to the fact that the engine was not downloaded or unpacked correctly.
Try completely deleting the engine folder and restarting the code until it completes.
If this does not help, open an issue with a detailed description of the problem.

    at C:\Users\OneDrive\Desktop\xxxxr\node_modules\bas-remote-node\src\services\engine.js:104:17
    at ChildProcess.exithandler (node:child_process:427:5)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1091:16)
    at Socket.<anonymous> (node:internal/child_process:449:11)
    at Socket.emit (node:events:513:28)
    at Pipe.<anonymous> (node:net:320:12)
Emitted 'error' event on Worker instance at:
    at [kOnErrorMessage] (node:internal/worker:290:10)
    at [kOnMessage] (node:internal/worker:301:37)
    at MessagePort.<anonymous> (node:internal/worker:202:57)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:735:20)
    at exports.emitMessage (node:internal/per_context/messageport:23:28)

sometime this error occur:


node:internal/event_target:1010
  process.nextTick(() => { throw err; });
                           ^
Error: Lock is not acquired/owned by you
    at C:\Users\OneDrive\Desktop\xxxx\node_modules\proper-lockfile\lib\lockfile.js:285:43
    at LOOP (node:fs:2673:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
Emitted 'error' event on Worker instance at:
    at [kOnErrorMessage] (node:internal/worker:290:10)
    at [kOnMessage] (node:internal/worker:301:37)
    at MessagePort.<anonymous> (node:internal/worker:202:57)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:735:20)
    at exports.emitMessage (node:internal/per_context/messageport:23:28) {
  code: 'ENOTACQUIRED'
}

already try to delete and install new bas engine data. but error still occur for second thread. the first thread success without error

imnobody221 commented 1 year ago

My current solution for now is wait until new thread success launch bas browser and then open a new thread worker. this step 'work but took long time to open example 10 browser. this bcs need to wait one by one thread to complete open new bas browser before open another thread worker. is there any solution to open 10 thread/browser simultaneously?

sergerdn commented 1 year ago

Hey @imnobody221,

We currently lack parallel tests or documentation demonstrating how to avoid such errors with the recommended logic from maintainers.

On a related note, I've raised an issue in one of the upstream library. Here's the link: https://github.com/CheshireCaat/browser-with-fingerprints/issues/2.

I also wanted to share a recommendation with you that might be helpful. It's recommended that each script/thread/process runs with a different FINGERPRINT_CWD folder to prevent functionality issues that may arise when multiple copies of the script update or use Browser dependencies simultaneously.

Here are some steps you can follow to implement this recommendation:

  1. Set FINGERPRINT_CWD to a custom path
  2. Run a simple script to download all dependencies/artefacts to that folder and then stop it
  3. Copy the FINGERPRINT_CWD folder to a new path
  4. Before running any new script/thread/process, set FINGERPRINT_CWD to the new path.
  5. Repeat steps 3-4 if necessary.
  6. Run all thread/script/worker/etc at once to open multiple browsers simultaneously each with their FINGERPRINT_CWD variable.

Additionally, I believe that setting up a new clean environment for each thread/process can be a good solution to prevent Browser dependency issues. I have been using such logic for many years in different situations because I prefer to run all of my tests in parallel, and I need to prepare a separate environment for each test. This is especially helpful if you are using any CI/CD scripts.

However, in any case, an official recommendation from the code owner would be beneficial.

Let me know if you have any further questions or if there's anything else I can help you with.

Thanks!

CheshireCaat commented 1 year ago

Support for worker threads is missing and was not originally intended. Currently, synchronization is safely implemented only within a single process.

For now, the best solution is to specify a separate FINGERPRINT_CWD for each worker, as @sergerdn pointed out above.

CheshireCaat commented 1 year ago

@sergerdn, the problem with such detailed documentation on the internal structure of the library is that it's needed by the smallest possible number of people, and it takes a lot of time to implement it. Nobody says that it is useless, but now it's not a priority.

The rest of the users are interested in using, not customizing. What to talk about, even if a bunch of references to the supported OS in the documentation are constantly ignored.

This is just my vision, I'm not ready to debate right now.

sergerdn commented 1 year ago

Support for worker threads is missing and was not originally intended. Currently, synchronization is safely implemented only within a single process.

For now, the best solution is to specify a separate FINGERPRINT_CWD for each worker, as @sergerdn pointed out above.

Would you be interested in adding some information about this to the documentation? I believe that such situations may arise in the future and having this information available in the documentation would be beneficial.

CheshireCaat commented 1 year ago

Yes, i'll add it later.

sergerdn commented 1 year ago

The rest of the users are interested in using, not customizing. What to talk about, even if a bunch of references to the supported OS in the documentation are constantly ignored.

I think this situation arose because it was unexpected for developers to encounter a script that exclusively supports the Windows platform, especially for those who are not very familiar with BAS. In light of this, I suggest modifying the README.md file to make it crystal clear that puppeteer-with-fingerprints only supports Windows at the moment.

We can achieve this by changing the note on the first line of the README.md file to something like this: puppeteer-with-fingerprints (ONLY WINDOWS CURRENTLY SUPPORTED!). :smile:

imnobody221 commented 1 year ago

Hey @imnobody221,

We currently lack parallel tests or documentation demonstrating how to avoid such errors with the recommended logic from maintainers.

On a related note, I've raised an issue in one of the upstream library. Here's the link: CheshireCaat/browser-with-fingerprints#2.

I also wanted to share a recommendation with you that might be helpful. It's recommended that each script/thread/process runs with a different FINGERPRINT_CWD folder to prevent functionality issues that may arise when multiple copies of the script update or use Browser dependencies simultaneously.

Here are some steps you can follow to implement this recommendation:

  1. Set FINGERPRINT_CWD to a custom path
  2. Run a simple script to download all dependencies/artefacts to that folder and then stop it
  3. Copy the FINGERPRINT_CWD folder to a new path
  4. Before running any new script/thread/process, set FINGERPRINT_CWD to the new path.
  5. Repeat steps 3-4 if necessary.
  6. Run all thread/script/worker/etc at once to open multiple browsers simultaneously each with their FINGERPRINT_CWD variable.

Additionally, I believe that setting up a new clean environment for each thread/process can be a good solution to prevent Browser dependency issues. I have been using such logic for many years in different situations because I prefer to run all of my tests in parallel, and I need to prepare a separate environment for each test. This is especially helpful if you are using any CI/CD scripts.

However, in any case, an official recommendation from the code owner would be beneficial.

Let me know if you have any further questions or if there's anything else I can help you with.

Thanks!

@sergerdn thank you for your suggestion. I tried and it work wonderful. but honestly it not really effective to open/create 10 or even 100 engine separately. overtime size or engine alone took a lot of space, not to mention the profile folder size. Even though we can used 10 engine to run 100 browser Parallelly while synchronized between calls to reduce CPU spiking, still it better than nothing.

imnobody221 commented 1 year ago

Support for worker threads is missing and was not originally intended. Currently, synchronization is safely implemented only within a single process.

For now, the best solution is to specify a separate FINGERPRINT_CWD for each worker, as @sergerdn pointed out above.

@CheshireCaat Thank you for you reply regarding this multithread issue. I really appreciate what you and you team doing here. It took a lot of afford to create this plugin that able to mask fingerprint. Not many company/developer create this framework for free. By providing detail documentation really help us a lot, but I do understand you guys need more time to improve BAS. Hope in future a lot improvement can be done. Thank you again!

bablosoft commented 1 year ago

@imnobody221

By providing detail documentation really help us a lot, but I do understand you guys need more time to improve or fix anonymity of the bas.

What exactly need to be fixed?

sergerdn commented 1 year ago

@sergerdn thank you for your suggestion. I tried and it work wonderful. but honestly it not really effective to open/create 10 or even 100 engine separately. overtime size or engine alone took a lot of space, not to mention the profile folder size. Even though we can used 10 engine to run 100 browser Parallelly while synchronized between calls to reduce CPU spiking, still it better than nothing.

I also suggest this logic:

Make sure that your machine has enough free RAM to support the RAM disk. Additionally, if you do not need your profile folder in the future, I recommend running your browser with a profile created on a RAM disk. This will significantly speed up the starting time and improve performance.

To improve your system's performance with a RAM disk, you can refer to this list of RAM drive software on Wikipedia: https://en.wikipedia.org/wiki/List_of_RAM_drive_software.

Note: If you are running your browser with a profile created on a RAM disk, you may need to write your own script to clear old profiles from memory periodically to prevent the RAM disk from filling up.

imnobody221 commented 1 year ago

@imnobody221

By providing detail documentation really help us a lot, but I do understand you guys need more time to improve or fix anonymity of the bas.

What exactly need to be fixed?

read this this page: https://github.com/CheshireCaat/browser-with-fingerprints/issues/2 let me clarify again, you and your team need more time to do more important things. @bablosoft thank you for you and your team hard work. Really do like BAS.

bablosoft commented 1 year ago

read this this page: https://github.com/CheshireCaat/browser-with-fingerprints/issues/2

I've read this, and already answered in this topic, but I can't find where it contains report about anonimity issues.

Previously you wrote:

I do understand you guys need more time to improve or fix anonymity of the bas.

Can you specify, what exactly need to be fixed?

sergerdn commented 1 year ago

@bablosoft

Would you like to organise a GitHub discussion for the project to address this issue and move it forward for further consideration? Additionally, having discussions can also be beneficial for us in the future. I suggest that we share any useful tips and tricks during the discussion, and that we use tags to categorise our comments.

What do you think about it?

https://docs.github.com/en/discussions

imnobody221 commented 1 year ago

@bablosoft. English is not my first language, so please excuse any mistakes. Nothing need to be fixed, that just my assumption when you reply to @sergerdn "There are much more important things to do.". Done edit my comment above.

imnobody221 commented 1 year ago

@sergerdn thank you for your suggestion. I tried and it work wonderful. but honestly it not really effective to open/create 10 or even 100 engine separately. overtime size or engine alone took a lot of space, not to mention the profile folder size. Even though we can used 10 engine to run 100 browser Parallelly while synchronized between calls to reduce CPU spiking, still it better than nothing.

I also suggest this logic:

  • Gather the necessary files or artifacts.
  • Transfer them to a storage location on a RAM disk that you have set up.
  • For each script, create a separate environment by copying the required files from the RAM disk to the RAM disk.
  • Launch the new process using the copied files in the separate environment.

Make sure that your machine has enough free RAM to support the RAM disk. Additionally, if you do not need your profile folder in the future, I recommend running your browser with a profile created on a RAM disk. This will significantly speed up the starting time and improve performance.

To improve your system's performance with a RAM disk, you can refer to this list of RAM drive software on Wikipedia: https://en.wikipedia.org/wiki/List_of_RAM_drive_software.

Note: If you are running your browser with a profile created on a RAM disk, you may need to write your own script to clear old profiles from memory periodically to prevent the RAM disk from filling up.

really like this idea. I will try to implement ASAP!. Thanks @sergerdn

sergerdn commented 1 year ago

@bablosoft. English is not my first language, so please excuse any mistakes. Nothing need to be fixed, that just my assumption when you reply to @sergerdn "There are much more important things to do.". Done edit my comment above.

I think that none of the people involved in this chat have English as their primary language. So, can we consider this matter resolved?

If you have any questions, please don't hesitate to open a new issue, and we will assist you as best we can.

imnobody221 commented 1 year ago

okey, thank you @sergerdn @CheshireCaat @bablosoft for this help. Really appreciate.