Closed prj closed 45 minutes ago
Does .NET cache the file accesses?
Yes, it does. The most likely explanation of the behavior that you are seeing is that this cache is corrupted. The cache is implemented in C/C++ runtime code.
Do you see other a crashes like segmentation faults too? They may have same root cause and they may be easier to debug.
Do your projects use any 3rd party packages with native code or telemetry solutions? They may be the cause of the problem.
Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.
Author: | prj |
---|---|
Assignees: | - |
Labels: | `area-AssemblyLoader-coreclr`, `untriaged`, `needs-area-label` |
Milestone: | - |
No, I do not see any other issues, no segmentation faults or anything of that nature. The only native library I use is the GRPC runtime, and this issue only happens on the OCI nodes. It did not happen on a dedicated linux server running the same applications.
So it is something that is triggered by the environment. All of my apps are doing it as well, and they are all vastly different.
Do the assemblies get garbage collected and then re-loaded at a later time? The main issue is that once loading of an assembly fails it stays in this failed state until restarted, so it will throw FileNotFound exceptions for every subsequent request (or attempt to access to that assembly). Is there any configuration I can try to prevent this behaviour?
My problem is that it is a production cluster, so I am in a bit of a pickle.
Do the assemblies get garbage collected and then re-loaded at a later time?
By default, the assemblies are loaded once and stay loaded until the process exits.
It is possible to load assemblies as collectible using https://learn.microsoft.com/en-us/dotnet/api/system.runtime.loader.assemblyloadcontext.-ctor. Do your apps use custom AssemblyLoadContexts?
No, they do not. I feel like I could reproduce this issue even with just a blank aspnetcore sample app.
I mean the errors get thrown even from the Kestrel runtime, which does not even have anything to do with any apps.
P.S. Maybe it is worth saying all these apps are aspnetcore with kestrel serving the requests, if that changes anything. But I am really not doing anything fancy. Some of the apps have no frontend at all, they are purely grpc service providers that hit a mysql database. It does not seem that what the app does has any influence on this at all, since all the apps get these errors...
This happened again today. 3 separate nodes all on the same day.
It might be some kind of issues on OCI with their storage. The big problem is that the FileNotFound errors keep repeating over and over once the issue happens a single time.
Because on OCI you do not have instance local storage, I am thinking to create a ramdisk and place the runtime on the ramdisk. This should hopefully alleviate the issue.
@prj Any update? Based on your previous symptoms it sounds likely a storage issue.
This issue has been marked needs-author-action
and may be missing some important information.
This issue has been automatically marked no-recent-activity
because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity
.
This issue will now be closed since it had been marked no-recent-activity
but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.
Description
All .net projects that run for a certain amount of time (the time varies, it can be from as little as 8 hours to a week) start exhibiting issues with locating runtime components.
Once location of a runtime component fails the issue persists until the application is restarted. After a single failure every single subsequent access to this assembly throw FileNotFoundException. Upon restart everything is fine again.
It happens in many different applications. Both compiled into a single executable as well as exploded ones. It also happens on all the nodes of my cluster.
Reproduction Steps
I am unable to give any reproduction steps, as the issue is sporadic. I would appreciate more information about debugging the issue.
Expected behavior
The application runs normally.
Actual behavior
Examples of various errors:
There are many more. From HttpClient, to Kestrel to application code.
Regression?
The issue is present in both .NET 5 and .NET 6. I have not tried newer .NET versions yet.
Known Workarounds
No response
Configuration
ARM64 architecture
The deployment is in OCI Cloud.
Other information
Because the applications are run on OCI on virtual machines, and the virtual machines have network attached storage, it could be possible that some requests to the storage fail sporadically.
Does .NET cache the file accesses? So if it fails once then it fails forever? Could this be the issue?