Open djhaynes opened 11 years ago
Since there are no restrictions on how a path can be represented, both c:/program files and c:/program files/ are valid. This ambiguity may lead to equally valid content that is not interoperable with different interpreters. For example, interpreter A could support paths that do not end it a file separator, interpreter B could support paths that end in a file separator, and interpreter C could support both representations. As a result, if you were to write content for interpreter A and then run it on interpreter B it would not evaluate to the same results as interpreter A's representation of a path is not supported on interpreter B. Therefore we need to clearly define how paths should be represented in the language so as to avoid this ambiguity. This change will also apply to registry keys and metabase keys.
Would the addition of a path datatype resolve this issue?
PROBLEM: Paths are currently treated as strings which leads to incorrect evaluations. This is made worse by the fact that content authors will naturally encode path strings inconsistently. Here are a few examples that demonstrate that paths cannot be thought of as simple strings: 1- /a/b/c == /a/b/c/ 2- /a/c == /a/b/../c 3- /a == /b because /a is a symbolic link to /b ...
ISSUES: 1- Each OS has slightly different semantics for a path. Defining our own path concept will result in a concept that does not align with all operating system notions of a path and will therefore not completely solve our path related issues.
2- Cannot use the native OS concept of a path. Using the native OS concept of a path is not acceptable because paths need to be compared independent of the native OS. We currently allow a product to collect information on one system and then evaluate the information on another system. If the native OS concept of a path is used we will not be able to support this use case.
3- Regular expressions are widely used to search paths. For a regular expression to work reliably there must be some standardization of the subject strings. If two different tools collect the same path and represent it differently as a string it is unlikely that most regular expressions will work properly all the time.
Due to the above issues, I think that correcting the handling of paths in OVAL simply does not fit within the bounds of a minor release. It is important to note that this issue was not reported by the community. This issues was discovered by the OVAL team while developing test content. We believe that under normal usage path strings in OVAL are used as inputs to native OS apis. These apis silently accept the varying formats of path string and will successfully collect the expected paths in most cases. This issue with paths is most troublesome when state assertions are made about paths because an observed path is compared to an expected path. This type of assertion is uncommon. Generally paths are used to collect properties about a file (permissions, size, hash, etc). State assertions are normally made about these properties, not the paths themselves.
all file related objects that have a path entity need to clearly state that a trailing path separator is required when an equality operation is specified. In addition to the documentation fix, a Schematron rule can be added to ensure that when the operation is 'equals' the path entity value ends in a path separator.