PixarAnimationStudios / OpenUSD

Universal Scene Description
http://www.openusd.org
Other
6.17k stars 1.23k forks source link

Improve `ArchGetFileName` for windows to return full path #3361

Open roggiezhang-nv opened 1 month ago

roggiezhang-nv commented 1 month ago

Description of Change(s)

In a fix to improve Alembic plugin to read assets with ArResolver to support other sources except local files (https://github.com/PixarAnimationStudios/OpenUSD/pull/3302), Alembic library provides 4 APIs: one of them supports to read from local paths and the other 3 are using istream interfaces which has flaws to support layered ABC files. Therefore, we can only use the API that supports to read from local paths. While it involves to get the mirroed local path from ArAsset, and that's ArchGetFileName supposed to work for.

However, ArchGetFileName only returns paths without drive letter, which is relative path related to the current drive the application is running from. This PR is to improve ArchGetFileName to return full path from FILE*. Here is an example before and after this PR:

Assuming the file path behind the FILE* handle is C:/path/includes/drive/letter.ext, and the application is running from D drive. // Before -- ArchGetFileName(file_handle) => "/path/includes/drive/letter.ext" // After -- ArchGetFileName(file_handle) => "C:/path/includes/drive/letter.ext"

The path returned from old API is /path/includes/drive/letter.ext, which is corresponding to D:/path/includes/drive/letter.ext in full path, which is wrong.

Fixes Issue(s)

-

roggiezhang-nv commented 1 month ago

@nvmkuruc @nvidia-jomiller for vis.

jesschimein commented 1 month ago

Filed as internal issue #USD-10302

jesschimein commented 1 month ago

/AzurePipelines run

azure-pipelines[bot] commented 1 month ago
Azure Pipelines successfully started running 1 pipeline(s).
jesschimein commented 1 month ago

/AzurePipelines run

azure-pipelines[bot] commented 1 month ago
Azure Pipelines successfully started running 1 pipeline(s).
anwang2009 commented 2 weeks ago

Thanks for the PR @roggiezhang-nv !

testFileSystem with these changes is failing on both macos and windows - could you address this? Here's the offending failed test and context on windows.:

72: first name C:\Windows\TEMP/archFS.12176
72: filePath C:\Windows\Temp\archFS.12176
72: ArchNormPath(first name) C:/Windows/TEMP/archFS.12176
72: ArchNormPath(filePath) C:/Windows/Temp/archFS.12176
72:  ArchError: [ArchNormPath(filePath) == ArchNormPath(firstName)] axiom failed
nvmkuruc commented 2 weeks ago

@roggiezhang-nv Maybe the test should compare ArchAbsPath to get rid of any symlink?

https://openusd.org/dev/api/group__group__arch___system_functions.html#gab146e5fb7a6c2f805936960bc58e3911 suggests it should be returning the canonical path.

roggiezhang-nv commented 2 weeks ago

Seems so. I'll fix it.

roggiezhang-nv commented 2 weeks ago

Replaced. Let's see how it goes.

jesschimein commented 2 weeks ago

/AzurePipelines run

azure-pipelines[bot] commented 2 weeks ago
Azure Pipelines successfully started running 1 pipeline(s).
anwang2009 commented 6 days ago

By the way @roggiezhang-nv - the test is still failing on Windows with similar output as I've commented above. Is it working on your Windows build?

roggiezhang-nv commented 6 days ago

By the way @roggiezhang-nv - the test is still failing on Windows with similar output as I've commented above. Is it working on your Windows build?

@anwang2009 Yes, it's working for my machine. The issue is that we don't have a way to return a uniform path on windows to check the equality of paths. Is it ok to use the uppercase or lowercase of paths on windows to compare?

roggiezhang-nv commented 6 days ago

Btw, @anwang2009, @nvmkuruc wants me to compare the perf between GetFinalPathNameByHandle (new) and GetFileInformationByHandleEx (old) to give more confidence.

I called each of them 100000 times, and GetFinalPathNameByHandle is indeed twice slower than GetFileInformationByHandleEx. However, it costs only 6us for GetFinalPathNameByHandle compared to 3us for GetFileInformationByHandleEx in each call, so the perf influence can be ignored.

Let me if I can simply the uppercase or lowercase to compare the paths for windows. Or you have any new suggestions.

roggiezhang-nv commented 6 days ago

Btw, I fixed the path compare issue with std::filesystem::equivalent, which can handle cases.

roggiezhang-nv commented 6 days ago

@anwang2009 I also need to correct my number as I passed a wrong param which is not similar to the real implementation to GetFinalPathNameByHandle. In reality, it's 5-6 times slower than old implementation (windows only, but not Linux and Mac) because of GetFinalPathNameByHandle. So it's 15us per call for ArchGetFileName this time on my machine. Not sure this is an acceptable cost.

meshula commented 5 days ago

I haven't spotted it being discussed elsewhere, so I'll note here, forgive the dry tone, and ignore as you wish :)

GetFinalPathNameByHandle resolves the final path of a file, including symbolic links, mount points, or junctions in addition to the work GetFileInformationByHandleEx might perform.

Given the reported benchmark result of resolving in microseconds, I have a suspicion the benchmark is of cached performance not unwarmed performance. Also, I think there might be a more "interesting" difference (ie, tens of milliseconds or more) if a the path needs resolving. but, it's an unavoidable penalty if we want correct operation.

A significant call point is in Crate, where it's going to be called once per asset so there'll be a general linear overhead. If it's really 16us, and you have a thousand files, it'll be 16ms, time lost in the noise.

If it's resolving across a network mount, right now it'll fail, and after a fix, the overhead will add up. But as I said, that's unavoidable.

Where the overhead would likely be significant is if someone wrote a loop to get info on all the files in a massive directory but that doesn't happen in the USD codebase itself.

jesschimein commented 5 days ago

/AzurePipelines run

azure-pipelines[bot] commented 5 days ago
Azure Pipelines successfully started running 1 pipeline(s).
roggiezhang-nv commented 5 days ago

I haven't spotted it being discussed elsewhere, so I'll note here, forgive the dry tone, and ignore as you wish :)

GetFinalPathNameByHandle resolves the final path of a file, including symbolic links, mount points, or junctions in addition to the work GetFileInformationByHandleEx might perform.

Given the reported benchmark result of resolving in microseconds, I have a suspicion the benchmark is of cached performance not unwarmed performance. Also, I think there might be a more "interesting" difference (ie, tens of milliseconds or more) if a the path needs resolving. but, it's an unavoidable penalty if we want correct operation.

A significant call point is in Crate, where it's going to be called once per asset so there'll be a general linear overhead. If it's really 16us, and you have a thousand files, it'll be 16ms, time lost in the noise.

If it's resolving across a network mount, right now it'll fail, and after a fix, the overhead will add up. But as I said, that's unavoidable.

Where the overhead would likely be significant is if someone wrote a loop to get info on all the files in a massive directory but that doesn't happen in the USD codebase itself.

@meshula Thanks for your insights. Yes, it's cached result. For non-cached result, that's 75us versus 20us for both functions on my machine. And I agree it should also depend on whether it's symbolic links, etc.

A significant call point is in Crate, where it's going to be called once per asset so there'll be a general linear overhead. If it's really 16us, and you have a thousand files, it'll be 16ms, time lost in the noise.

That's one of the issue I saw from Crate implementation as the filename returned by the call is not used anywhere, which means we can reduce that cost from USD codebase. @nvmkuruc for this issue.