Closed Opiumtm closed 8 years ago
Additional information: application crashes when there is ListView control on hosted in the Shell sub-page. Application crahses randomly when scrolling ListView or sliding in FlipView. There is no problem with GridView and there is no problem on PC x86 or x64, no problem when solution is not compiled using .NET Native chain. There is no problem in earlier versions, only in a few latest versions of Template 10! It seems that application is crashed due to out of memory error (see rapid memory usage growth). (update: the problem is not caused by out of memory) Problem occured not only in my project, for other people too. Problem forced another developer to rollback version of Template 10 to a earlier one.
Is anyone else seeing this? @Windows-XAML/community
I've had it crash on some pages with ListViews but it wasn't attached to the debugger so I don't know if it's the same issue. It was a release build with .Net Native.
@diogorolo I experimented with crashes and it is clear - it crashes only when it is compiled with .NET Native. Same code deployed on same device but compiled without .NET Native did not crash. Also, I read on some developer forum this crashes is linked to a Template10 some way. Developer reported on a forum (sorry, it's in russian language) issue appeared after a Template10 upgrade and when he is rolled back to a earlier version, issue with a .NET Native crashes and ListView go away. He was very angry and wrote he is tired of a Template10 lack of stability.
@Opiumtm can you try this: Switch to Release mode, enable .Net Native compilation (if you've turned it off) like this:
Now, turn on Native debugging like this:
Next, set Exception Settings as shown:
Now, run the app in the debugger. You may have to ignore any try/catch exceptions in your own code and in template10's scaffolding but you should eventually get to the point where you have the problematic listview showing. Once that happens, reproduce the crash and paste as much of the stack trace from the native debugger in here as possible. It will at least give us something to look at in terms of what could be causing the problem.
Yes! I reproduced this random crash on ARM device and have a stack trace clearly indicating this is a Template10 issue!
System.UnauthorizedAccessException: Access is denied. (Excep_FromHResult 0x80070005)
at SharedLibrary!
I think there is critical error in Template10. This error absolutely prevents production deployment of applications on real ARM devices, and must be fixed immediately. And worst of all, this issue reproduce only on an actual ARM devices when application compiled with .NET Native. It can not be reproduced on a x86 PC, on windows 10 mobile emulator or in debug builds on any device family. But real handheld usage is ARM compiled with .NET Native (Microsoft does not accept in store appications compiled for classic .NET, only compiled for .NET Native). For now I can not publish beta version of my application because of this Template10 Issue.
Yes this is a very serious bug. I have some of this error in my app, i hope it was fixed very soon and also nuget package is updated!
I also got this from the Application Insights:
at SharedLibrary!
The first occurrence was 4th Fev.
Thanks for everybody's feedback.
Note that T10 is open source for a reason: It's done by volunteers. It's easy to file issues and complain, but it's much more productive to fork the code and fix the problem about which you're complaining.
So... Those not willing to do this will yes, be waiting on a volunteer to find and fix the root cause. We'll do our best and these stack traces and date info will go a good way to assisting us. If anybody can help to pinpont even further, that will speed up the process even more.
Thanks!
My initial hypothesis just looking at the code is we're encountering the UnauthorizedAccessException
due to UIViewSettings.GetForCurrentView().UserInteractionMode == UserInteractionMode.Touch;
in CalculateBackVisibility
as this method is triggered by Dependency Properties. The call for GetForCurrentView()
should likely be marshalled to the Dispatcher via RunAsync()
.
I've not tested this hypothesis, though.
I'm wondering if the reason we've not seen it until now is because this commit (for #495) fixed a problem where the PropertyChangedCallback
wasn't being registered as it should've been. This change was made 14 days ago. I settle on this commit due to the fact that the vast majority of the file has been in place since last fall.
The only other recent-ish commits are for #478, 29 days ago.
I'm going to leave the fix for this issue to @JerryNixon since he was the one who committed the changes I've cited here as he likely understands better what he was going for and what should be happening.
@bc3tech @JerryNixon developer on a forum reported last version without this issue was 1.1.1 Upgrade to a 1.1.2 and later results in a random crashes on a mobile device. Hope, this information about exact broken build is helpful.
@JerryNixon can you go back and drop tag
s when we did nuget releases? I'm going to have a look at when this might've been now and will report back.
For a few weeks ago I had a similar problem. My application crashed when running WACK which made it very hard to troubleshoot.
My solution was simply to create a log file. Maybe my code below could be useful. It’s creates a file in the download folder (all applications has write access to that folder). In Template 10 there is a LoggingService which seems to be used a lot. So make sure this service is puts the log to a file. Then it might be easier to solve this. I don’t have access to any ARM device right now so I can’t do it myself.
static IRandomAccessStream LogAccessStream;
static IOutputStream LogOutputStream;
static DataWriter LogDataWriter;
public static void LogToFile(string text = "")
{
if (LogAccessStream == null)
{
string filename = "persitent_log_" + DateTime.Now.ToString("yyyyMMdd_HHmmss");
var newFile = DownloadsFolder.CreateFileAsync(filename + ".txt").AsTask();
newFile.Wait();
var randomFileTask = newFile.Result.OpenAsync(FileAccessMode.ReadWrite).AsTask();
randomFileTask.Wait();
LogAccessStream = randomFileTask.Result;
LogOutputStream = LogAccessStream.GetOutputStreamAt(0);
LogDataWriter = new DataWriter(LogOutputStream);
}
LogDataWriter.WriteString(DateTime.Now.ToString("HHmmss") + " " + text + Environment.NewLine);
LogDataWriter.StoreAsync().AsTask().Wait();
LogOutputStream.FlushAsync().AsTask().Wait();
LogAccessStream.FlushAsync().AsTask().Wait();
}
@bc3tech @JerryNixon As error is reproduced only under certain circumstances (.NET Native compiled, on ARM, when ListView is present on page hosted inside Shell) I think there is probably a race condition and cause for this error is timing issues, maybe order of events. When code is run in a different environment or compiled with different compiler, timeline of events is different and issue does not reproduce. As I realized, code compiled with .NET Native more aggressively uses CPU and often cause UI thread freezes and lags, even when asyn/await pattern is used. Code runs considerably faster and because of it more background and foreground work can be done (and active scrolling of ListView certainly requires work - to measure controls, update layouts - especially when there is complex markup with many UserControls and there are many attached events starting more work after a UserControl loading and unloading, calls to update of many and many bindings which now represented directly in a code when using recommended {x:Bind ...}
). Native compiled code runs as fast as system framework code and system code may struggle with user code for a CPU when run on same thread priority. When code is run on a ARM device, CPU nearly have not such power as typical x86 desktop (or Hyper-V hosted emulator running on a same powerful x86) have, and this makes struggle for a CPU power more epic. So, this abuse of a CPU power can cause a timing issues and any timing dependent code with potential race conditions, sensitive to order of events or event timing could probably crash. See, if your event handlers dependent on order of events or code makes assumptions on timing.
@Opiumtm no doubt everything you've said is possible. However you and the others filing & chiming in on this issue keep pointing at Template 10 as the culprit. Hence my suggestion to fork the repo and go for a fix and my comment aiding in targeting any commit(s) that may have served to "expose" this race condition.
@bc3tech
keep pointing at Template 10
It is self evident. See stack trace (absolutely identical for different application and different developers). Crash is reported at a Template10 internal code.
@bc3tech
Hence my suggestion to fork the repo and go for a fix and my comment aiding in targeting any commit(s) that may have served to "expose" this race condition.
We have our applications and this is our primary work. As you can see, we do not have this issue in a testing or proof-of-concept code. We suffer this issue in a production code and this issue make our applications totally unstable and we have tons of complains from an angry users (well, I do not published this version as I tested new version on an actual device before publish and only have complains from an angry users because I do not rolling out a new version with bug fixes, but other developers reported this issue from a production code). Remember that issue appeared not from the start, it appeared out of nowhere with a new build and broken our already nice working code. Also, personally I already wasted many hours to realize what the problem is. I already catch this issue (at first I thought it is a .NET Native toolchain issue and started threads on a Microsoft forum), exhaustively described its symptoms and provided a stack trace (and even an exact Template10 version number) from an actual device to help you fix this outright critical issue. Do you not realize there is critical, top priority issue which prevents use of Template10 in a production code once and for all? Personally I have a bunch of my own bugs to fix in my own code right now. These bugs I can fix. But this issue in a Template10 raises the question to throw away Template10 from my code (and there will be a very hard and long work because Template10 is heavily used in a code) and do not use Template10 because of its practical uselessness and harmfulness because of this issue if it is not fixed in a near future with a top priority. Do you realize this?
I think you should demand a refund, @Opiumtm
@makeithome You can be ironical, but developers who write an applications for users most of the time have some work to do. As you can see, client oriented developers most of the time write applications, not libraries. And often client oriented developers are forced to read radical complains and even an outright hate speech when angry users got even a minor issues. It's a not so pleasant realities of our profession. What I can not understand - why a critical issue does not receive a deserved effort to fix it? When I have a critical bug in my application code, I sit down and fix it right now because application must be workable at least. I throw away any other work, throw away code refactoring, throw away minor bug fixes and new features. When application simply do not work - top priority is to make it work at any reasonable cost possible.
And for now I already provided much exact information that should help to fix this issue. I complained here because I made an investigation. Other developers do not. When I complained here - other developers complained too. If I do not do this, they for some time should be clueless and do not realize why their application is stopped to work. Maybe after a tons of complains from an angry users. On a developer forum other developer already received tons of complains and simply rolled back to an older version of Template10. He does not filed an issue here. He just rolled back. He decided not to waste his time, not to investigate issue, not to provide stack traces, decided not to help developers of Template10. And he is wrote on a forum that he is tired of a Template10 lack of stability and probably he will not use Template10 at all in the other projects. Also, he wrote some unprintable phrases and succumbed to emotions, specifically hate against developers of a Template10 library. I was much more patient and helpful in this dire situation. But when I read to "do it myself" I also began to lose control of myself and succumb to emotions. So, I clearly realize all the risks of using an open-source code in a production. These risks are well known. It is clearly my own fault to use open source code and I can blame only myself because I do not paid for it. But when you face these risks of an open source code personally it is hard to stay calm.
@diogorolo It's a weird situation. Open source projects always have a well known risks. It's OK for an open source project to have issues, to receive feedback and so on. What is not OK - is to point to a developer (who already made an investigation, provided stack trace, environment description and a ways to reproduce a bug) to "do it myself". To be honest, I already done most of the work for a developers of a Template10. When you have a random and strongly dependent to an environment bug (it reproduced only on an actual ARM device and is not reproduced on emulator - so, to investigate this bug you must have an actual testing device at least) - most work is not to fix it in a code. Most of the work is to investigate an exact conditions to reproduce a random bug. I done it well and posted here all the information needed. But after all I suggested to fix it myself. In this situation I might write all the libraries myself, if even after an issue investigation developers point me to branch code, to understand someone else's code. I do not have exact knowledge about Template10 internal logic and architecture! I do not develop Template10, I just using it!
Open source developers at least must understand that not all people in the world are developing their project, most of the people just using it. And we can do our best to deliver much possible information about bug to help developers fix it. But it is nonsense to expect from us to understand all the library code, its internal architecture and conventions. We can not, because we are application developers and do not dig much in a code of a libraries we use. It's a basics, honestly.
I want to point out that I only suggested "fix it yourself" if you're in need of immediate turnaround:
Those not willing to do this will yes, be waiting on a volunteer to find and fix the root cause. We'll do our best and these stack traces and date info will go a good way to assisting us. If anybody can help to pinpont even further, that will speed up the process even more.
I never said we wouldn't (hopefully!) eventually find & fix it.
@bc3tech Well, thank you made a matter more clear. On this thread it is certainly unclear when or how this bug is about to be fixed. Days have passed and issue even does not have assigned a properly tags and there is not much info from a library developers. Critical issue which prevent productive usage of Template10 at all. And when I was pointed to fix it myself (and not told this critical issue is about to be fixed in a near future) it was too much for me. Excuse me if I was too expressive.
@Opiumtm , could you replaced the LoggingService.cs code with the code below and run your application again? This will create a log file in the download directory on your device. Which in this case is a lot more useful than a stack trace I guess.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Threading.Tasks;
using Windows.Storage;
using Windows.Storage.Streams;
namespace Template10.Services.LoggingService
{
public delegate void DebugWriteDelegate(string text = null, Severities severity = Severities.Info, Targets target = Targets.Debug, [CallerMemberName]string caller = null);
public enum Severities { Trace, Info, Warning, Error, Critical }
public enum Targets { Debug, Log }
public static class LoggingService
{
public static bool Enabled { get; set; } = false;
public static DebugWriteDelegate WriteLine { get; set; } = new DebugWriteDelegate(WriteLineInternal);
private static void WriteLineInternal(string text = null, Severities severity = Severities.Info, Targets target = Targets.Debug, [CallerMemberName]string caller = null)
{
LogToFile($"{severity} {caller} {text}");
switch (target)
{
case Targets.Debug:
System.Diagnostics.Debug.WriteLineIf(Enabled, $"{DateTime.Now.TimeOfDay.ToString()} {severity} {caller} {text}");
break;
case Targets.Log:
throw new NotImplementedException();
}
}
static IRandomAccessStream LogAccessStream;
static IOutputStream LogOutputStream;
static DataWriter LogDataWriter;
public static void LogToFile(string text = "")
{
if (LogAccessStream == null)
{
string filename = "persitent_log_" + DateTime.Now.ToString("yyyyMMdd_HHmmss");
var newFile = DownloadsFolder.CreateFileAsync(filename + ".txt").AsTask();
newFile.Wait();
var randomFileTask = newFile.Result.OpenAsync(FileAccessMode.ReadWrite).AsTask();
randomFileTask.Wait();
LogAccessStream = randomFileTask.Result;
LogOutputStream = LogAccessStream.GetOutputStreamAt(0);
LogDataWriter = new DataWriter(LogOutputStream);
}
LogDataWriter.WriteString(DateTime.Now.ToString("HHmmss") + " " + text + Environment.NewLine);
LogDataWriter.StoreAsync().AsTask().Wait();
LogOutputStream.FlushAsync().AsTask().Wait();
LogAccessStream.FlushAsync().AsTask().Wait();
}
}
}
I just caught the exception in the debugger.
'Exception thrown: 'System.UnauthorizedAccessException' in System.Private.SharedLibrary.Interop.Generated.dll
Additional information: Access is denied. (Excep_FromHResult 0x80070005)
It was in the line:
var mobilefam = ResourceContext.GetForCurrentView().QualifierValues["DeviceFamily"].Equals("Mobile");
edit: A workaround that I implemented in the last release of my app, i just put the whole function inside a try catch block and it "solved" the issue.
@diogorolo I use another code for same goals. Try use it. It works and does not raise any exceptions.
Windows.System.Profile.AnalyticsInfo.VersionInfo.DeviceFamily == "Windows.Mobile"
Also, this code
Windows.System.Profile.AnalyticsInfo.VersionInfo.DeviceFamily == "Windows.Mobile"
is a Microsoft recommended way to know if your device is running a Windows Mobile.
I read about this approach on a stackoverflow site and on a MSDN.
@Opiumtm Yes, that is the recommended approach. I have not heard of the other method before (var mobilefam = ResourceContext.GetForCurrentView().QualifierValues["DeviceFamily"].Equals("Mobile");
) and it seems based on current circumstances that it is not the correct way to go.
@oadugmore As I realized, this "not recommended" method ResourceContext.GetForCurrentView().QualifierValues["DeviceFamily"].Equals("Mobile");
is abusing a resource loading subsystem. Clearly, resources can be qualified with device family. But to use a resource subsystem for a detection of a device family is certainly an abuse of API.
Just created a pull request to fix this issue. Sorry for the mess with rollback and re-commit.
Thank you for solving the issue (apparently). However, MS equally recommends both ways of detecting the device family, just see: https://msdn.microsoft.com/en-us/library/windows/apps/mt188202.aspx#detecting_the_platform
They recommend using TryGetValue
which is not what was being done here.
On Thursday, February 11, 2016, Sebastian notifications@github.com wrote:
Thank you for solving the issue (apparently). However, MS equally recommends both ways of detecting the device family, just see: https://msdn.microsoft.com/en-us/library/windows/apps/mt188202.aspx#detecting_the_platform
— Reply to this email directly or view it on GitHub https://github.com/Windows-XAML/Template10/issues/630#issuecomment-183091660 .
True that. I just wanted to point out that the first approach is not unrecommended in general.
@JerryNixon Please, publish fixed version to Nuget as fast as possible. Issue is really critical.
@JerryNixon @bc3tech A week have passed and there is no updated library on Nuget. This issue is critical. Even when issue was fixed by pull request, days have passed and no fix is rolled out as Nuget package. This hotfix must be deployed to developers immediately, due to critical nature of this bug!
+1
Why is the NuGet package essential? If the issue has been addressed, you have full access to the source here on GitHub and can include it in your project. NuGet packages are a convenience certainly, but not essential. As you have stated many times, developers do have "real" work to do, and everyone who contributes to this project is a volunteer that participates over and above their "real job". While everyone on the team is sensitive to the critical nature of the issue for you, those who can create and publish the NuGet packages have many other demands on their time and critical issues of their own to address.In the interim, you can certainly help yourself by using the source that is available now on GitHub and revert to the NuGet package when it becomes available.
Okay people. I see we're better developers than diplomats. Ha! I'll be back from vacation tomorrow and will look into this a quickly as I can. Thanks for everyone's input on this thread.
Good news, everything seems to be working. I have tested on a 950 both in Debug and Release with and without Optimization. If this occurs again, please start a new thread. Hopefully it will not. I will release an updated NuGet pack before the weekend.
Application crashes randomly on a physical device when compiled with .NET Native for ARM. Running processes info show rapid memory usage grow up to >180 Mb. When compiled with plain .NET (not .NET Native) there is no such problem.