eslint / eslint

Find and fix problems in your JavaScript code.
https://eslint.org
MIT License
24.44k stars 4.41k forks source link

Fix: clone config before validating (fixes #12592) #13034

Closed anikethsaha closed 3 years ago

anikethsaha commented 4 years ago

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update [X] Bug fix (template) [ ] New rule (template) [ ] Changes an existing rule (template) [ ] Add autofixing to a rule [ ] Add a CLI option [ ] Add something to the core [ ] Other, please explain:

What changes did you make? (Give an overview)

Cloning the rule config and other configs before validating them as they get mutated with default values when used useDefault : true in agv. I couldn't find any other entry point for this issue other than this, do let me know if any!

Is there anything you'd like reviewers to focus on?

fixes https://github.com/eslint/eslint/issues/12592

kaicataldo commented 4 years ago

Thanks for working on this!

anikethsaha commented 4 years ago

Can we clone the config at a higher level and then pass that to the validator function?

By higher level, you meant where we are reading/getting the config? then I think it needs to pass two configs to the validator one cloned and one original. cause only the cloned one will be validated. the original one will be passed to the rules

kaicataldo commented 4 years ago

Is there a reason we can't clone the config as soon as we have the full resolved config and use that for validation and then pass that to the rules?

anikethsaha commented 4 years ago

i will try that ๐Ÿ‘

anikethsaha commented 4 years ago

@kaicataldo I updated the cloning to a higher level. I think I might have missed some other entry points for configs.

Also, do you think it would be helpful to add a clone here instead

https://github.com/eslint/eslint/blob/76324ace67893c3d7e38a369114d6128df9ffb65/lib/cli-engine/config-array-factory.js#L803-L809

Like this

if (configData) {
  const clonedConfigData = lodash.cloneDeep(configData);
  return this._normalizeConfigData(
    clonedConfigData,
    plugin.filePath,
    `${importerName} ยป plugin:${plugin.id}/${configName}`
  );
}
anikethsaha commented 4 years ago

@aladdin-add can you check if the current one is fine ?

let me know if this has to be changed. after that, I will add some tests

aladdin-add commented 4 years ago

I'd suggest it in something like linter.js, than in getRuleOptions -- it could be cloned multi-times.

anikethsaha commented 4 years ago

I'd suggest it in something like linter.js, than in getRuleOptions

I think the current one in this PR is something like this.

earlier I planned to have it clone and validate in config-validator.js alone. In this way, it will clone every config irrespective of where they are coming from.

But I think the current one is the entry point for the config (not for the directives,)

anikethsaha commented 4 years ago

@kaicataldo can you please take a look if this is the fine place to clone?

nzakas commented 4 years ago

@kaicataldo can you take a look at this again?

kaicataldo commented 4 years ago

Apologies for the delay! I've been sick the last few weeks and am just getting caught up.

nzakas commented 4 years ago

@kaicataldo can you check if your concerns have been addressed?

mdjermanovic commented 4 years ago

Do we have documented restrictions on what can be an option's value?

In the current version, a custom rule might expect something like a function, Date instance, RegExp instance, etc. to be defined in .eslintrc.js as a rule's option. JSON schema doesn't support those types, but the schema property and thus the validation is optional for rules.

After this change, values will be limited to JSON data types only.

kaicataldo commented 4 years ago

This is a good point and is probably worth having a separate discussion about before we merge this in (or alternately find a different deep merge algorithm). Up until now, weโ€™ve wanted config to be JSON-serializable (or at least JSON-compatible) because we support JSON configs. That being said, we donโ€™t know what custom plugins are currently doing, and itโ€™s looking likely we will be implementing a new config file format that will not have this limitation (see RFC).

anikethsaha commented 4 years ago

In the current version, a custom rule might expect something like a function, Date instance, RegExp instance, etc. to be defined in .eslintrc.js as a rule's option.

agreed, this change might break the option of this type.

I think we can discuss in a separate issue and hold this PR. or wait till the RFC is finalized/implemented?

kaicataldo commented 4 years ago

Is there a reason we can't use use lodash.clonedeep? I know the original implementation of this PR used it, but I'm not sure I understand why that wouldn't work in this case.

anikethsaha commented 4 years ago

there was an issue with it ref this

I am not sure why is that behavior.

kaicataldo commented 4 years ago

Huh, strange. Might be worth filing a bug report with lodash, as that seems like an easy pitfall.

anikethsaha commented 4 years ago

orth filing a bug report with lodash, as that seems like an easy pitfall.

not sure whether it's intentional or not. though I couldn't find anything on the docs about this.

mdjermanovic commented 4 years ago

By lodash/lodash#2725 it seems to be intentional, as probably the only way to support circular references.

As suggested in the issue, perhaps we could explore cloneDeepWith? Maybe to start a new (independent) deep cloning whenever it encounters an object.

anikethsaha commented 4 years ago

As suggested in the issue, perhaps we could explore cloneDeepWith?

I will check this

Maybe to start a new (independent) deep cloning whenever it encounters an object.

I didn't get this one. did you mean custom cloning implementation?

mdjermanovic commented 4 years ago

Maybe to start a new (independent) deep cloning whenever it encounters an object.

I didn't get this one. did you mean custom cloning implementation?

Partially custom: maybe we could call cloneDeepWith again from its customizer function when it gets an object, or something like that if it makes sense.

anikethsaha commented 4 years ago

@mdjermanovic @kaicataldo I have changed the cloning logic with this. I seem working out. let me know if any more specific case I am missing or any suggestion would be great ๐Ÿ‘

mdjermanovic commented 4 years ago

I was thinking about something like this, instead of maintaining our own deep cloning:

const lodash = require("lodash");

function getCustomizer() {
    let root = true;

    return function customizer(value) {
        if (!root && typeof value === "object" && value !== null) {

            // start new lodash.cloneDeepWith
            return lodash.cloneDeepWith(value, getCustomizer());
        }

        root = false;

        // return `undefined` for the default lodash.cloneDeepWith behavior
        return void 0;
    }
}

// test

const nested = {};

const obj = { x: nested, y: nested };

const clonedObj = lodash.cloneDeepWith(obj, getCustomizer());

console.log(clonedObj.x === clonedObj.y); // false

This is just an idea, I'm not sure about the performance. Also, this would cause "Maximum call stack size exceeded" if there are circular references.

anikethsaha commented 4 years ago

@mdjermanovic The solution you mentioned is working as well ๐Ÿ‘

I think both logic is nearly the same in terms of speed.

The one you mention might be ~0.10s faster (avg) - from tests

image

image

I did run the perf three times, and the result seems relative to this one.

Also, this would cause "Maximum call stack size exceeded" if there are circular references.

any way to reproduce this?

Let me know which one to keep ๐Ÿ‘ .

IMO, as performance-wise, it seems lodash.cloneDeepWith has a bit of better result.

kaicataldo commented 4 years ago

I agree with @mdjermanovic that it would be better to use lodash instead of maintaining our own full deep cloning algorithm. If it's also more performant, it seems like the clear winner!

anikethsaha commented 4 years ago

I was thinking if there was some way to test/fix this Also, this would cause "Maximum call stack size exceeded" if there are circular references. issue.

kaicataldo commented 4 years ago

Are there any practical cases where someone would want to include a circular reference? I'm having trouble coming up with any...

kaicataldo commented 4 years ago

@anikethsaha To test it, you could add a circular reference inside settings, since this can contain arbitrary data.

mdjermanovic commented 4 years ago

I was thinking if there was some way to test/fix this Also, this would cause "Maximum call stack size exceeded" if there are circular references. issue.

It isn't fixable. It might be nice to report a meaningful error, but the amount of code and complexity to do that is most likely too high for something that normally shouldn't happen in the configuration.

Are there any practical cases where someone would want to include a circular reference? I'm having trouble coming up with any...

Probably not, just had to mention a possible downside (which also exists with the custom cloning algorithm in the actual iteration of this PR).

anikethsaha commented 4 years ago

ok cool, I will change the current logic with cloneDeepWith ๐Ÿ‘

anikethsaha commented 4 years ago

I changed clone logic to the one @mdjermanovic suggested.

Working as expected ๐Ÿ‘

I did run a test for circular ref with both logic and both are failing. Should add test for the circular ref ? I didn't add them because I thought it doesn't belong to the scope of this change!

let me know if that needs to be added

kaicataldo commented 4 years ago

I did run a test for circular ref with both logic and both are failing. Should add test for the circular ref ? I didn't add them because I thought it doesn't belong to the scope of this change!

Can you clarify what you mean by this? Do you mean that you wrote a test that tried to clone a config that had circular references and it led to an infinite loop?

My preference would be to add some kind of safeguard for this. I don't feel comfortable knowingly adding a memory leak that could wreak havoc on automated CI boxes that might accidentally be implementing a config with a circular reference. One other thought I had - could we add a safeguard by somehow implementing detection of circular references that could allow us to break out of the cloning logic and use the original config instead (maybe with a warning message)? This would allow us to not break anyone's builds who might be accidentally relying on this behavior.

@nzakas @btmills @mysticatea Thoughts on this?

In summary, we currently run ajv on the imported configuration, which can have the unintended consequence of mutating the configuration by reference when it sets the default values.

I suggested we try to clone the entire config object when it is imported so that there was no chance of ESLint's internals mutating the config object. We have a working fix for this in this PR, but this solution has the caveat of trying to clone circular refs leading to an infinite call stack.

The alternative to this would be to just deep clone the rules configuration, but I suppose this problem could technically occur there too.

anikethsaha commented 4 years ago

Can you clarify what you mean by this? Do you mean that you wrote a test that tried to clone a config that had circular references and it led to an infinite loop?

I added the following eslint config and it was failing.

let errorConfig = ["error", {}];

class CircularRef {
  constructor() {
    this.self = this;
  }
}

module.exports = {
  settings: {
    sharedData: new CircularRef()
  },

  rules: { camelcase: errorConfig, "default-case": errorConfig }
};

I couldn't see the error message as the stack was big and my ide (I am not using my local setup, though I can try ) didn't able to display the whole.

I don't feel comfortable knowingly adding a memory leak that could wreak havoc on automated CI boxes that might accidentally be implementing a config with a circular reference.

agreed ๐Ÿ’ฏ

could we add a safeguard by somehow implementing detection of circular references that could allow us to break out of the cloning logic and use the original config instead (maybe with a warning message)?

try-catch ?

Also, it looks like https://github.com/eslint/eslint/issues/13324 can be affected with this as well. ?

mdjermanovic commented 4 years ago

My preference would be to add some kind of safeguard for this. I don't feel comfortable knowingly adding a memory leak that could wreak havoc on automated CI boxes that might accidentally be implementing a config with a circular reference. One other thought I had - could we add a safeguard by somehow implementing detection of circular references that could allow us to break out of the cloning logic and use the original config instead (maybe with a warning message)? This would allow us to not break anyone's builds who might be accidentally relying on this behavior.

Maybe something like this:

const lodash = require("lodash");

function cloneDeep(val) {
    const seenObjects = new Set();
    let forceDefault = false;

    function customizer(value) {
        if (typeof value === "object" && value !== null) {
            if (forceDefault) {
                forceDefault = false;
            } else {
                if (seenObjects.has(value)) {
                    throw new Error("Cannot clone circular structure");
                }

                seenObjects.add(value);
                forceDefault = true;
                const clonedValue = lodash.cloneDeepWith(value, customizer);
                seenObjects.delete(value);

                return clonedValue;
            }
        }

        // return `undefined` for the default lodash.cloneDeepWith behavior
        return void 0;
    };

    return lodash.cloneDeepWith(val, customizer);
}

We call this with try-catch and use the original config if it throws, maybe with a warning message.

anikethsaha commented 4 years ago

@mdjermanovic @kaicataldo I used debug for logging the warning though it won't be visible by default.

an alternative can be console.log (not ideal) or process.stdout.write .

let me know if the current one is fine

kaicataldo commented 4 years ago

Maybe I'm missing something, but won't this not actually fix the specific specific problem we're trying to solve here, since it would silently reuse the original config object if there are any objects that are referenced more than once in the config (even if they're not necessarily nested within each other)? Seems like we need to actually do a full traversal of the finalized config object to check specifically for nested circular references before running the cloning algorithm. I'm not sure of the performance hit here, though, particularly for complex configurations.

Alternatives I can think of:

  1. Go back to just cloning the rules before running ajv. While this issue could potentially still exist if a rule configuration had a nested circular reference, it's a lot less likely since we're dealing with a smaller subset of the full config. Apologies if we do end up doing this, because I know this was the original fix implemented and I suggested we go down the path of trying to clone the entire config.
  2. Stick with this current iteration but actually throw an error and disallow rereferencing objects in configs entirely. This would be a breaking change, and feels like a bit of a kludge to me. I would like to avoid this.

@anikethsaha Sorry for all the back and forth. Feel free to wait until we finalize the design before continuing to make changes.

anikethsaha commented 4 years ago

2. Stick with this current iteration but actually throw an error and disallow rereferencing objects in configs entirely. This would be a breaking change, and feels like a bit of a kludge to me. I would like to avoid this.

agreed.

Seems like we need to actually do a full traversal of the finalized config object to check specifically for nested circular references before running the cloning algorithm. I'm not sure of the performance hit here,

yea, it would be a bit of heavy as what I understood it would mean traversing it twice.

maybe the solution of #13324 can be helpful here.

@anikethsaha Sorry for all the back and forth. Feel free to wait until we finalize the design before continuing to make changes.

Cool ๐Ÿ˜„

mdjermanovic commented 4 years ago

Maybe there's something missing in the algorithm, but this cloneDeep was supposed to throw only if there are circular references. Otherwise, it should create multiple copies of the same object if it was referenced more than once.

anikethsaha commented 4 years ago

I added the try-catch to not throw an error and just log (debug log) the error and continue with an un-cloned config.

Otherwise, it should create multiple copies of the same object if it was referenced more than once.

Should I change it to use the object returning from cloneDeep instead of the original one ? I would replace the throw when there is circular ref with a debug log !

const lodash = require("lodash");

function cloneDeep(val) {
    const seenObjects = new Set();
    let forceDefault = false;

    function customizer(value) {
        if (typeof value === "object" && value !== null) {
            if (forceDefault) {
                forceDefault = false;
            } else {
                if (seenObjects.has(value)) {
-                   throw new Error("Cannot clone circular structure");
+                    debug("Cannot clone circular structure")
                }

                seenObjects.add(value);
                forceDefault = true;
                const clonedValue = lodash.cloneDeepWith(value, customizer);
                seenObjects.delete(value);

                return clonedValue;
            }
        }

        // return `undefined` for the default lodash.cloneDeepWith behavior
        return void 0;
    };

    return lodash.cloneDeepWith(val, customizer);
}

thoughts ?

mdjermanovic commented 4 years ago

Just to clarify what's the problem with the current iteration - would it fix the original issue?

anikethsaha commented 4 years ago

Just to clarify what's the problem with the current iteration - would it fix the original issue?

yes , but if there is circular ref then it will use the original config instead of the cloned one so in that case it wont be fixed if there is a circular ref

mdjermanovic commented 4 years ago

Just to clarify what's the problem with the current iteration - would it fix the original issue?

yes , but if there is circular ref then it will use the original config instead of the cloned one so in that case it wont be fixed if there is a circular ref

I thought that was the goal - fallback to previous behavior only if there are circular refs, in order to avoid breaking builds with such configurations.

Additionally, we could improve the algorithm to return the already cloned object when it detects a circular reference (instead of throwing), while still creating multiple copies of objects that were referenced multiple times but are not part of any cycles.

anikethsaha commented 4 years ago

I thought that was the goal - fallback to previous behavior only if there are circular refs, in order to avoid breaking builds with such configurations.

yeah, that was the idea.

@kaicataldo from https://github.com/eslint/eslint/pull/13034#issuecomment-632283007, it seems like the current is working as it was suggested right?

Even if the config is not cloned and the original conifg is passed on, user can notice the behavior and with this change using --debug they can find out the reason for not working as intended as their config has a circular refs (which is ideally not recommendable unless there is some kind of tool like bundlers that handles that. ).

aladdin-add commented 4 years ago

TBH, I have never seen some devs writing such a config, so I'm wondering if it was worthy to fix it, by adding a lot of complexity(and probably loss perf)!

what if we just:

nzakas commented 4 years ago

I think @aladdin-add has a point here: it seems like Ajv modifying the object itโ€™s validating is the root cause, so asking them to fix it might be the best path forward. We are a sponsor after all. :)

nzakas commented 4 years ago

Pinged the Ajv creator on the issue: https://github.com/eslint/eslint/issues/12592#issuecomment-633759796

anikethsaha commented 4 years ago

Yeah that would be an easy win ๐Ÿ‘

mdjermanovic commented 4 years ago

@kaicataldo just to clarify, in the case that we eventually decide to clone the entire config:

Maybe I'm missing something, but won't this not actually fix the specific specific problem we're trying to solve here, since it would silently reuse the original config object if there are any objects that are referenced more than once in the config (even if they're not necessarily nested within each other)?

It should fix the specific problem (the original issue) since it does create multiple clones of objects that were referenced more than once. It will silently reuse the original config only if there are circular references.

Anyway, this code certainly adds some complexity and probably needs more tests.

There is also clone package as an alternative to lodash's clone, but I'm not sure if we want to add another dependency.

nzakas commented 4 years ago

Can we simplify this a bit? Since rule configs are JSON-serializable, can we do JSON.parse(JSON.stringify(ruleConfig)). I believe doing so will automatically throw for circular references.