OutpostUniverse / OP2Utility

C++ library for working with Outpost 2 related files and tasks.
MIT License
4 stars 0 forks source link

XFile::GetDirectory strips leading slash from absolute paths #215

Closed DanRStevens closed 5 years ago

DanRStevens commented 5 years ago

XFile::GetDirectory incorrectly strips leading "/" from absolute paths.

GetDirectory("/somePath");  // "somePath" returned, but "/somePath" expected

Current implementation:

std::string XFile::GetDirectory(const std::string& pathStr)
{
    fs::path p(pathStr);

    if (p.has_relative_path())
    {
        if (p.relative_path() == p.filename()) {
            return "./";
        }

        return p.remove_filename().relative_path().string();
    }

    if (p.has_root_path()) {
        return p.remove_filename().root_path().string();
    }

    return "./";
}

I believe the bug comes from using has_relative_path and has_root_path rather than using is_relative and is_absolute: has_path is_path

Alternatively, I believe there is a shorter solution using: parent_path I believe both cases should be (assuming a filename is given in the path):

return p.parent_path().string();
DanRStevens commented 5 years ago

A few things to add to this.

According to C++ Filesystem, "/" is an absolute path on Posix systems, but still a relative path on Windows systems. This make sense, since without a drive specifier, it doesn't uniquely identify a path.


When I think of a relative empty path, I usually think of "" (empty string), rather than "./". Is there a reason for using "./" instead of ""? I noticed the C++ Filesystem code does this in a few places. I know in some cases, command line tools have special names for parameters, and if you want to specify a file with that special name, you would instead pass a parameter of "./specialName" to distinguish it as being a file in the current directory. For example read stdin (read from a special device) versus read ./stdin (read from a file literally named "stdin").

My other thought is maybe it could be security related, though this seems like a bit of a stretch. For example, a relative path + a relative path should give a relative path. However, "" + "/" = "/", which could be interpreted as an absolute path (on Posix). By using "./" you'd end up with "./" + "/" = ".//" which is often simplified or interpreted as "./", which is still a relative path. Granted, the problem in this case comes from adding an absolute path to a relative path, which doesn't make sense. That is, the second path "/" wasn't really a relative path (on Posix) to begin with. (relativePath + absolutePath = ?)

Should GetDirectory("") return "" or should it return "./"? It seems like you've chosen the later, though I'm uncertain why.


Should directories include a trailing slash? If you have the string "a/b", the name "b" could refer to either a file or another directory. However, the string "a/b/" clearly implies that "b" is a directory. When we return directory names, should we attempt to retain trailing slashes, or should they be stripped (as is common in shell programming).

For reference, the current method is inconsistent here. The slash is stripped for directory names, but retained when a filename is passed.

Shell code tends to strip slashes. One benefit to stripping is when you build paths from variables you can write things like ${PREFIX}/${FILENAME}, which has pretty clear meaning, and a clear delineation between values. On the other hand ${PREFIX}${FILENAME} might not have such a clear delineation if ${PREFIX} does not end with a trailing slash.

In terms of C++ programming, the shell programming style would translate as prefix + "/" + filename, or using operator / as prefix / filename.

With trailing slashes, the code is either prefix + filename or prefix / filename.

In my experience, retaining trailing slashes results in cleaner code, though it is a bit of a departure from typical shell code.

Brett208 commented 5 years ago

My experience with shell programming is limited. I think our best bet is to try to use the C++ filesystem library as intended (whatever that means?). I'm happy to standardize more towards a shell standard. Actually, I'm happy to use whatever you think is best here as a standard.

I use ./ to indicate that I definitely mean a relative directory or current directory.

You are thinking about this a lot closer than I did when originally programming it. My lack of clear direction is probably part of why there are bugs now in addition to seeing the different behaviour on the different operating systems.

I'm happy to update the OP2Archive and OP2MapImager documentation if needed based on decisions on how to proceed here.

Too bad Linux and Windows couldn't have decided to use the same symbol as a directory separator. So many hours wasted on this subject and differing text file line endings.

-Brett

DanRStevens commented 5 years ago

Alright, I'll try to come up with something sensible.


Definitely would have been nice if "standard" text files were actually standard. The really frustrating thing about the path separator difference, is that Windows has long supported both directory separators in the kernel. It was only the terminal command shell that was limited to backslash.

DanRStevens commented 5 years ago

I found the documentation on path objects to be quite useful. The header description gives a very good overview. In particular, there are generic_ versions of some methods that return strings with the portable (forward slash) path separator. The default is the preferred (platform dependent) path separator.

Curiously, for path normalization, the empty string is treated as "./", which normalizes to ".". I'm still not sure why it doesn't just use the empty string. I would love to find some documentation with some rationale on this.

For some reason, the normal form has trailing slashes on directory names stripped. That's typical in shell programming, though this seems like a contrary design point to the part:

On those OS where native format differs between pathnames of directories and pathnames of files, a generic pathname is treated as a directory path if it ends on a directory separator and a regular file otherwise.

My thought is to retain trailing slashes. It avoids ambiguity.

Brett208 commented 5 years ago

It looks like on Windows, when passing a "/" as the pathStr, the following function will not find any files:

for (const auto& entry : fs::directory_iterator(pathStr)) {
    . . .
}