apify / fingerprint-suite

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
Apache License 2.0
1.01k stars 105 forks source link

Invalid Filename Error on Instantiating `FingerprintGenerator` in AWS Lambda #216

Closed lugfug closed 1 year ago

lugfug commented 1 year ago

Description:

Hello maintainers,

I'm encountering an "Invalid filename" error when attempting to instantiate the FingerprintGenerator within an AWS Lambda environment. The error seems to be originating from the adm-zip module, which is a dependency of the generative-bayesian-network, further used by header-generator.

Code Sample Demonstrating the Invocation:

const { generateFingerprintHeaders } = require('./helpers/browserConfig');

exports.handler = async (event, context) => {
    try {
        const fingerprintHeaders = generateFingerprintHeaders();
    } catch (error) {
        console.log("Generated Fingerprint Headers:", fingerprintHeaders);
        console.error("Error during FingerprintGenerator instantiation:", error.message);
        console.error("Stack trace:", error.stack);
        throw error; // re-throw to ensure the error is not silently handled
    }
};

Error Logs:

2023-08-29T05:07:05.979Z    cf10a87d-0e24-43a2-85e1-0b58654ecf71    INFO    About to instantiate FingerprintGenerator...
2023-08-29T05:07:05.983Z    cf10a87d-0e24-43a2-85e1-0b58654ecf71    ERROR   Error during FingerprintGenerator instantiation: Invalid filename
2023-08-29T05:07:05.984Z    cf10a87d-0e24-43a2-85e1-0b58654ecf71    ERROR   Stack trace: Error: Invalid filename
    at new module.exports (/opt/nodejs/node_modules/adm-zip/adm-zip.js:57:19)
    at new BayesianNetwork (/opt/nodejs/node_modules/generative-bayesian-network/bayesian-network.js:24:21)
    at new HeaderGenerator (/opt/nodejs/node_modules/header-generator/header-generator.js:89:38)
    at new FingerprintGenerator (/opt/nodejs/node_modules/fingerprint-generator/fingerprint-generator.js:15:9)
    at generateFingerprintHeaders (/var/task/helpers/browserConfig.js:37:34)
    at exports.handler (/var/task/index.js:39:30)
    at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1147:29)

Additional Context:

Would appreciate any insights or guidance on how to resolve this error. Thank you for your time!

barjin commented 1 year ago

Hi @lugfug and thank you for submitting this issue!

Now, we've encountered a similar error with Vercel some time ago - back then, it was caused by Vercel's bundler (@vercel/nft) which was dropping our ML models fingerprint-generator was using. I've tried writing this fingerprint-generating Lambda myself (sans your helper methods) and it worked just fine (see example below).

const { FingerprintGenerator } = require('fingerprint-generator');

exports.handler = async (event, context) => {
    const fp = new FingerprintGenerator();
    const fingerprint = fp.getFingerprint();

    return {
        statusCode: 200,
        body: fingerprint,
        headers: {
            'Content-Type': 'application/json',
        },
    };
};

Can you try to replace your code with this snippet and confirm whether it works in your environment?

If this doesn't help, can you please share more details about your setup? Are you using some bundlers, how do you manage dependencies in your project etc.? Thanks! :)

barjin commented 1 year ago

Hello again @lugfug , I'm going to close this issue due to inactivity now - but in case you have any questions, feel free to reopen it any time.

Thanks!

lugfug commented 1 year ago

Hello @barjin ,

Thank you for your previous guidance. I've taken the steps you recommended and wanted to provide an update on my findings.

Please excuse the delay in my reply.

I have been working on this project and building it out over nearly 500 revisions.

I can confirm that the entire script functions exactly as expected... until I integrate the fingerprint generation code.

The short version of what the script does. A search URL is provided to the script, It then sets up a Playwright browser with generic headers (which I and currently constructing from various public lists), and then it connects via proxies to the search URL and processes the page.

I would rather use your Fingerprint-Suite to generate the headers all in one spot, vs. my current version of building the headers from multiple processes.

1. Code Update

Based on your suggestion, I integrated the FingerprintGenerator instantiation and fingerprint generation into my AWS Lambda function. Here's the code I used at the very top of my script before any other code had a chance to run.:

const { FingerprintGenerator } = require('fingerprint-generator');
const version = require('./package.json').version;

exports.handler = async (event, context) => {
    // Log the version number of the script
    console.log(`Script Version: ${version}`);

    // Log all the parameters passed to the script
    console.log(`Received event: ${JSON.stringify(event)}`);

    try {
        if (event.generate_fingerprint) {
            const fp = new FingerprintGenerator();
            const fingerprint = fp.getFingerprint();

            // Log the generated fingerprint
            console.log(`Generated Fingerprint: ${fingerprint}`);

            return {
                statusCode: 200,
                body: fingerprint,
                headers: {
                    'Content-Type': 'application/json',
                },
            };
        }
    } catch (error) {
        console.error("Error occurred:", error.message);
        console.error("Error trace:", error.stack);
        throw error; // re-throw the error to ensure it's not silently handled
    }
};

2. Logs from AWS Lambda Execution

Upon executing the above code, here's the log output I received:

INIT_START Runtime Version: nodejs:18.v12   Runtime Version ARN: arn:aws:lambda:us-west-1::runtime:0bdff101a7b4e0589af824f244deb932XXXXXXXXXXXXXXXXXXXXXXX
START RequestId: e32a251a-7ffc-462XXXXXXXXXXXXXXXXXXXXXX Version: $LATEST
2023-09-09T21:27:50.085Z    e32a251a-7ffc-4622-ae20-ba9582a766dd    INFO    Script Version: 0.0.478
2023-09-09T23:24:24.289Z    33829e2f-3e28-4e91-b805-7891484500dc    ERROR   Invoke Error    
{
    "errorType": "Error",
    "errorMessage": "Invalid filename",
    "stack": [
        "Error: Invalid filename",
        "    at new module.exports (/opt/nodejs/node_modules/adm-zip/adm-zip.js:57:19)",
        "    at new BayesianNetwork (/opt/nodejs/node_modules/generative-bayesian-network/bayesian-network.js:24:21)",
        "    at new HeaderGenerator (/opt/nodejs/node_modules/header-generator/header-generator.js:89:38)",
        "    at new FingerprintGenerator (/opt/nodejs/node_modules/fingerprint-generator/fingerprint-generator.js:15:9)",
        "    at exports.handler (/var/task/index.js:58:24)",
        "    at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1147:29)"
    ]
}

END RequestId: e32a251a-7ffc-4622-ae20-ba9582a766dd
REPORT RequestId: e32a251a-7ffc-4622-ae20-ba9582a766dd  Duration: 9.78 ms   Billed Duration: 10 ms  Memory Size: 3000 MB    Max Memory Used: 124 MB Init Duration: 1149.48 ms   

It seems the "Invalid filename" error is still persisting and is originating from the adm-zip module used by generative-bayesian-network, and subsequently, the fingerprint-generator.

3. Additional Details

I've ensured that:

If you need more information about my environment I'll be happy to provide it.

Given the continuation of this issue, even after trying your proposed solution, I would truly appreciate any further insights or recommendations you might have. Thank you for your continued assistance.

Warm regards,

[lugfug]

barjin commented 1 year ago

Hi, I've closed the new issue and reopened this one, so all the context is in one thread.

Unfortunately(?), I just ran your code example without any issues on AWS. I've attached a .zip archive I used to deploy it on AWS. Try creating a whole new Lambda (with all the basic / default permission settings) and upload code as zip (and use the zip archive below).

It's definitely an interesting issue - all the more so because I cannot reproduce it. Definitely keep me updated on whether you've managed to run this zip at least. Thanks!

📁 aws-deploy-package.zip

lugfug commented 1 year ago

@barjin Thank You for re-opening this issue. I will create a new test AWS Lambda Function and try your package.

Theoretically, there must be a permissions issue that is causing the issue on my end.

FYI, I have all my node modules in an AWS Lambda Layer for redundancy, and reusability across Lambda Functions.

Likewise using a Lambda Layer allows the Lambda Function to have a very small foot print when uploading the deployment package.

This strategy avoids the deployment package size limitation.

lugfug commented 1 year ago

Hello @barjin,

I've pinpointed the underlying issue.

Firstly, I deployed your code and modules into a new AWS Lambda Function. The outcome was positive: your code executed successfully, both with and without implementing a Lambda Layer.

Subsequently, I swapped out your index.js for my existing Lambda Function's index.js, which led to a failed function invocation.

Upon comparing your modules folder with mine, I noticed a significant size difference. This was observed only for the modules that both of us had in common.

Afterward, I replaced all my corresponding modules with yours in my Lambda Layer deployment. On invoking the Lambda Function again, everything worked perfectly! The Fingerprint Suite code executed as intended, generated a fingerprint, and logged the results as JSON to the console.

I've recreated this scenario multiple times using various permutations, and I can consistently reproduce either the failure or success based on the modules used.

This investigation strongly suggests a dependency conflict as the root cause.

For further clarity, I'll attach two zip files: one containing your modules that work seamlessly, and the other with my modules that consistently fail upon deployment. node_modules_barjin.zip node_modules_lugfug.zip

What perplexes and concerns me is that I had installed the npm packages individually, specifically to sidestep such dependency conflicts.

@barjin, could you clarify the recommended method for both installing and updating the fingerprint-suite? Additionally, it might benefit everyone if these instructions were further explicitly detailed in the README.md.

barjin commented 1 year ago

Well, looking into your node_modules_lugfug.zip file, I see that your version is missing the contents of the (header|fingerprint)-generator/data_files folder. The error message you see on AWS is absolutely right, i.e. adm-zip cannot find the files (because they do not exist).

I doubt this is because of dependency conflicts, I'd wager that it's caused by your tooling (perhaps your zip implementation omits zip files from the contents of the file getting currently zipped?)

Either way, the only correct way of installing and updating the fingeprint-suite packages is npm install and npm update, which should install the packages correctly, i.e. with the data .zip files.

lugfug commented 1 year ago

Hi @barjin,

Thank you for your assistance throughout this process. After a thorough investigation, we've identified the root cause of the issue. It was indeed related to the zip command used for deployment. The command was set to exclude .zip files, which led to some necessary files being omitted from the node_modules directory in the AWS Lambda Layer deployment zip file. This caused the AWS Lambda function to fail upon invocation due to missing dependencies.

I've resolved the issue by modifying the deployment command to allow .zip files in the node_modules directory while still excluding .zip files in other root directories.

This case highlights the importance of carefully configuring deployment commands to ensure all necessary files are included in the final package. I'll make sure to thoroughly test the function after deployment in the future to confirm it's working as expected.

Thank you again for your help. I'm closing this support request now, but I'll reach out if we encounter any further issues.

Best regards, @lugfug