Closed dandoesgithub closed 11 months ago
Hi @dandoesgithub, thanks for the report.
This only happens with one instance.
To fix this we will need to know what is special about that instance. Have you perhaps added some custom repositories to it?
No, I haven't, it's the default repository
OK, is there anything else about that instance that you can think of that could explain why this doesn't happen for me or anyone else up to this point?
Well, I deleted all the mods in the GameData folder (CKAN doesn't know that), but I don't think that's what's causing the problem. I'm not sure why it could be happening, as it was working in 1.33.2. I'll try forgetting the instance then reloading it in CKAN
I could take a look at the CKAN files in the particular instance
@dandoesgithub It might help to zip up your CKAN folder from that instance and post it here.
slight problem: i have deleted that instance. craaaap. i'll try to recover it with a program but i don't think i'll succeed
Oh, in that case we won't be able to investigate this. Let us know if you do recover the instance and we can re-open it.
wait: the new instance that i made a few days ago (to replace the broken one) has the same problem. i can send that one instead.
no, every other instance works except for this one. either it actually is my PC, or it's the instance
anyway, here's the CKAN file CKAN.zip
Thanks, that ZIP looks fine to me. Maybe @JonnyOThan had something particular in mind.
If it's having the error during "Refreshing game version data", that gets stored in another place. Can you send us %LOCALAPPDATA%\CKAN
as well? You can exclude the downloads
subfolder if it's huge.
Are you using the same ckan.exe
file for all your instances, or are you copying it into each instance? Usually that doesn't matter, but I'm trying to think of other things that might differentiate instances.
CKAN.zip, i am using the same ckan.exe for all my instances.
should i try deleting the ksp-default repo then adding it again?
Thanks, that looks all-in-order as well. I would not expect deleting and re-adding the repo to help, but I wouldn't discourage you from trying it if you think otherwise.
The next thing to try is probably capturing some logs so we can get more information about what CKAN is doing when this problem happens. If you unzip this log4net.xml
file into the directory where your ckan.exe
file lives and then run CKAN, it should load the settings automatically and generate a CKAN.log
file that you can share here.
<?xml version="1.0" encoding="utf-8" ?>
<log4net>
<appender name="CkanFile" type="log4net.Appender.RollingFileAppender">
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Finding latest available" />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Installable: " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Already enlisted" />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Available: " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Adding new available module " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Adding available module " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Adding to available module: " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Copying " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Writing file " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Making directory " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Making parent directory " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Not registering " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Got owners: " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Module parsed: " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Reading CKAN data from " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Skipping archive entry " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<regexToMatch value="Checking compat of" />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Got contents" />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Not removing directory " />
<acceptOnMatch value="false" />
</filter>
<filter type="log4net.Filter.StringMatchFilter">
<stringToMatch value="Removing " />
<acceptOnMatch value="false" />
</filter>
<file value="./CKAN.log" />
<appendToFile value="false" />
<rollingStyle value="Once" />
<maxSizeRollBackups value="10" />
<lockingModel type="log4net.Appender.FileAppender+MinimalLock" />
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%6timestamp %-5level %logger %message%newline" />
</layout>
</appender>
<!-- Log everything by default -->
<root>
<level value="DEBUG" />
<appender-ref ref="CkanFile" />
</root>
<!-- Turn down some of the noisier loggers -->
<logger name="CKAN.CompatibilitySorter">
<level value="INFO" />
</logger>
<logger name="CKAN.NetFileCache">
<level value="INFO" />
</logger>
<logger name="CKAN.CkanModule">
<level value="INFO" />
</logger>
</log4net>
Just confirming that I'm getting this problem as well. But sometimes, now always. I'll try with the logging @HebaruSan suggested.
Thanks, @Falki-git. That's very strange; that log file shows CKAN completing and exiting normally. If it's the file that you meant to send, then that may mean we don't have logging set up for wherever this problem is happening.
"One or more errors occurred" isn't even our text; it's fromAggregateException
. So some problem is happening and then the framework is obscuring it with that class. The documentation says it's used extensively by PLINQ, so this is probably related to the recent changes to parallelize various things for performance. (Or the root cause could be unrelated, but now happening in parallel and therefore obscured by PLINQ.)
Here's a test build with a first attempt to get the original problem from the AggregateException
based on its documentation; please try it if you have time and are willing to trust that I'm not sending you malware:
(I did build that on Ubuntu, so the risk of malware should be very low anyway.)
@HebaruSan, I tried it and I get the same behavior. I can refresh several times without a problem and then it hangs on the next refresh. Hang lasts for about a minutes and then the "task canceled" is thrown.
Here are the logs from this attempt.
And just a quick video of what's happening.
https://github.com/KSP-CKAN/CKAN/assets/72734856/1efcbdb2-a378-4674-b56d-a3c346d01f59
Yeah, you're supposed to get the same behavior, all I added was handling of the AggregateException
. That's why it now says "A task was canceled" instead of "One or more errors occurred".
The log has basically the same info. I.e., almost none:
107407 ERROR CKAN.GUI.Main A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
So, that's... great.
Deleting and re-adding the repo worked! apparently the repo was the problem!
One more attempt to get useful info out of this exception:
@Falki-git or @dandoesgithub or anyone else who can reproduce this, please try that build and let us know what it says.
oh, the issue is still happening on that build, deleting the default repo and re-adding it worked again
Deleting the repo and re-adding it doesn't actually change anything, though. I think this is a race condition that has turned out differently after you've done that by coincidence.
Some support pages are saying HttpClient
can throw this as a timeout, which on the surface makes sense since we do now use that to get the etags, but that should take over a minute and a half to time out, and that video is less than 15 seconds:
Also @dandoesgithub the point of that build was to gather info. Can you please share a screenshot or the logs?
where are the logs located?
I should say explicitly: The test builds that I have been sharing are not attempting to fix the problem, because we do not know what the problem is. They are solely trying to get enough information to identify what the problem is. Normally I would do this myself on my own system, but since the problem does not happen for me, I have to go through whomever can reproduce it.
@dandoesgithub see above re: logs. https://github.com/KSP-CKAN/CKAN/issues/3950#issuecomment-1859219946
Thanks; that looks just like Falki's log, so either you're still using the previous build, or TaskCanceledException
doesn't provide any InnerException
s.
147790 ERROR CKAN.GUI.Main A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
Side note, the 100-second default timeout seems to be confirmed by inspecting an HttpClient
object:
csharp> LoadAssembly("System.Net.Http");
csharp> var hc = new System.Net.Http.HttpClient();
csharp> hc.Timeout;
00:01:40
@HebaruSan, here is the log for the newest ckan.exe you sent: CKAN_log2.zip
Regarding the video, I should have been explicit also, I cut it down to 15 seconds so you don't need to look at a static screen, but from clicking Refresh to the timeout it takes a minute or so (likely 100 secs as you said).
P.S. I didn't try to delete the repo yet
@dandoesgithub and @Falki-git, here's a new test build for you to try:
This one does new things: first it prints the value of HttpClient.Timeout
, and then it catches, logs, and ignores any exceptions it throws while trying to get the etags. This might help with the problem you're having, but whether it does or not, it's important that we get the log file so we can determine the right long term fix for the real builds.
Regarding the video, I should have been explicit also, I cut it down to 15 seconds so you don't need to look at a static screen, but from clicking Refresh to the timeout it takes a minute or so (likely 100 secs as you said).
Thanks, that makes a lot more sense! That makes me a lot more confident concluding that this is in fact HttpClient
timing out. I built and typed the above before you posted this, but I think it's still worth capturing those logs.
And here's the log. It's more verbose this time.
And here's the log.
Thanks! Yup, HttpClient.SendAsync
is definitely throwing.
31572 INFO CKAN.Net HttpClient.Timeout is 00:01:40
131575 WARN CKAN.Net Failed to get etag for https://github.com/KSP-CKAN/KSP2-CKAN-meta/archive/main.tar.gz
System.AggregateException: One or more errors occurred. ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at CKAN.Net.CurrentETag(Uri url)
---> (Inner Exception #0) System.Threading.Tasks.TaskCanceledException: A task was canceled.<---
It's more verbose this time.
But it looks like it continued and finished the update this time?
Yes, it completed the update
Alright, based on that and the fact that the previous implementation seemingly never had this problem, I have to consider the possibility that HttpClient
just doesn't work very well (despite the fact that it has a compiler warning recommending its use). To provide a point of comparison, here's a fresh build with the old pre-HttpClient
implementation of CurrentETag
:
If that build works for you, @Falki-git and @dandoesgithub, without a 100-second delay, then we'll probably need to roll back the switch to HttpClient
. It would be nicer to keep up with the latest and greatest library code, but if HttpClient
is just a hopelessly buggy mess, then we can't reasonably use it.
@dandoesgithub and @Falki-git, here's a new test build for you to try:
* (file removed)
This one does new things: first it prints the value of
HttpClient.Timeout
, and then it catches, logs, and ignores any exceptions it throws while trying to get the etags. This might help with the problem you're having, but whether it does or not, it's important that we get the log file so we can determine the right long term fix for the real builds.Regarding the video, I should have been explicit also, I cut it down to 15 seconds so you don't need to look at a static screen, but from clicking Refresh to the timeout it takes a minute or so (likely 100 secs as you said).
Thanks, that makes a lot more sense! That makes me a lot more confident concluding that this is in fact
HttpClient
timing out. I built and typed the above before you posted this, but I think it's still worth capturing those logs.
sorry for the late reply, just noticed you sent a new ckan.exe. i'll send the logs of the one you sent before, then the logs of the one rn CKAN.log
well, the build that you sent just now definitely works when refreshing my instance. CKAN.log
Thanks, that's one vote solidly in favor of reverting to WebRequest
, then. If Falki sees the same thing, HttpClient
will be escorted from the premises.
Yes, that's a lot better. I cannot reproduce the timeout anymore. Tried multiple instances, refreshed many times, it works 100% now. CKAN_logs4.zip
Hi @Dominiquini, this issue is not about a back and forth between me and you, it is about HttpClient
throwing a TaskCanceledException
after a bogus timeout that gets wrapped in a fromAggregateException
and reported as "One or more errors occurred". I have locked the conversation and deleted the off-topic comments. Thanks for filing the duplicate bug report.
Is there an existing issue for this?
Operating System
Windows 10
CKAN Version
1.34.0
Games
KSP 1
Game Version
KSP 1.12.5
Did you make any manual changes to your game folder (i.e., not via CKAN)?
Yes, I deleted the mods inside of GameData
Describe the bug
CKAN gets stuck during "Refreshing game version data", then after a while errors. This only happens with one instance. The solution for this bug for now is searching a mod in the search bar, then refreshing
I expected it to load normally
Steps to reproduce
Step 1: Load an instance Step 2: Refresh Step 3: Get lucky (or unlucky) and get the error
Relevant log output
No response