Closed jsmythsci closed 5 years ago
I have done some testing with different versions and it seems that this issue was introduced in version 16.12, presumably related to the TFS SDK upgrade that it included (#234).
I updated from 15.2 directly to 16.11 and confirmed that all pipelines continued to work as intended. I then updated from 16.11 to 16.12 and found that the initial test run of the test pipeline worked as intended but all subsequent runs failed.
I suppose this means that this issue might not affect elastic agents since it doesn't seem to occur the first time a pipeline is run on an agent.
I have not explicitly tested any versions between 15.2 and 16.11 or between 16.12 and 18.10 but I don't believe those tests would be valuable at this point.
I still believe that we could work around the issue by having the agent create a server workspace instead of a local one but I have yet to find any documentation on how to do that.
@jsmythsci Thank you for doing all that investigation!
About having the agent create a server workspace, I don't know for sure. Maybe something changed in the SDK between those versions?
Most of the code related to this seems to be around: this and this.
You might learn something by turning on debug logs. In versions such as 16.12, there was probably a log4j.properties in the agent's config/ directory.
In my view, this looks like an unsupported case. I would assume that the working directory (called a "directory") is not a file. I wouldn't be surprised if GoCD code or the TFS SDK code has a call to mkdir
somewhere.
@arvindsv Thank you for taking the time to respond.
Could you clarify which case you believe is unsupported? I can see 2 different examples of unusual behaviour here and I am not sure which you are referring to.
I can understand the latter not being supported but, in my opinion, it is not an actual use case but rather a workaround for the bug that is documented in #183.
I dug through some of the code you linked and, if I am reading things right, I believe that GoCD's code does not currently provide a mechanism to specify the workspace location (server vs local).
I think the workspace is created here, which seems to be calling this method from the SDK.
Assuming I have correctly understood how all of these pieces fit together, GoCD is hard-coded to pass null
to createWorkspace()
's WorkspaceLocation
parameter, causing the workspace to be created using the server's default location.
If that is true then we may be able to work around this issue by changing our TFS server's default workspace location. I will have to do more investigation to figure out what the implications of that are.
@jsmythsci My expectation would be that the project path points to a directory and the destination is a directory too. However, I can see that you're trying to work around #183.
What you said about the WorkspaceLocation
parameter seems correct. It does seem to be passing null
which makes it use the server's default.
If you have a patch for this (or at least a reasonable idea to try), which allows us to get past this, I'm happy to work with you on it to see if we can fix the issue. What would be important, in that case, is that the rest of the tests and cases work too.
I wonder how we would know, without telling it, that the project path points to a file. Not sure how it was working before.
I think the simplest thing to do would be to replace the hard-coded null
with a hard-coded value for creating a server workspace. The following should do it (not sure why GitHub isn't letting me upload it as an attachment):
--- GoTfsVersionControlClient.java 2018-12-04 09:07:24.432449300 -0500
+++ "GoTfsVersionControlClient - Copy.java" 2018-12-04 09:13:19.856552100 -0500
@@ -20,6 +20,7 @@
import com.microsoft.tfs.core.clients.versioncontrol.soapextensions.Changeset;
import com.microsoft.tfs.core.clients.versioncontrol.soapextensions.RecursionType;
import com.microsoft.tfs.core.clients.versioncontrol.soapextensions.Workspace;
+import com.microsoft.tfs.core.clients.versioncontrol.soapextensions.WorkspaceLocation;
import com.microsoft.tfs.core.clients.versioncontrol.specs.version.ChangesetVersionSpec;
import com.microsoft.tfs.core.clients.versioncontrol.specs.version.LatestVersionSpec;
@@ -57,7 +58,7 @@
}
public GoTfsWorkspace createWorkspace(String workspace) {
- return new GoTfsWorkspace(client.createWorkspace(null, workspace, null, null, null, null));
+ return new GoTfsWorkspace(client.createWorkspace(null, workspace, null, WorkspaceLocation.SERVER, null, null));
}
public Changeset[] queryHistory(String projectPath, ChangesetVersionSpec uptoRevision, int revsToLoad) {
I'm assuming the only use-case for this code is the following workflow on GoCD agents:
According to Microsoft's documentation, the reasons for using a local workspace all relate to making it easier to work offline; if that never comes up in our workflow then I don't see any downside to hard-coding a server-side workspace.
I am not a developer by trade, though, and there may very well be things I have not thought of.
I am interested to know what other people think.
@arvindsv Any feedback on the patch or the thought process around it? Are there any use-cases other than fetching the file(s) from source at pipeline run time that need to be considered?
Unfortunately I don't know how to set up a development environment so I can't apply the patch and run the tests myself at this point. I could take the time to figure it out but it would be nice to know whether or not I am on the right track before making that commitment.
@jsmythsci I'll apply the patch and make an installer available to you tomorrow (or later today). Maybe you can set it up and take a look?
If that works for you, I'll look at what it can affect. My guess is that if multiple agents (maybe in the same machine) are all trying to use the same server workspace, it could be a problem. We will figure out what to do, in that case. For a start, let's go with the patch you have and see if it helps in any way.
@jsmythsci Here are the links:
You will need both. You can't use this agent with your current server. Bring up the server first and then the agent and once they're connected to each other, you can try this out.
I had to change the patch a bit:
diff --git a/tfs-impl/tfs-impl-14/src/main/java/com/thoughtworks/go/tfssdk14/wrapper/GoTfsVersionControlClient.java b/tfs-impl/tfs-impl-14/src/main/java/com/thoughtworks/go/tfssdk14/wrapper/GoTfsVersionControlClient.java
index 81fdadbe9..f04b63249 100644
--- a/tfs-impl/tfs-impl-14/src/main/java/com/thoughtworks/go/tfssdk14/wrapper/GoTfsVersionControlClient.java
+++ b/tfs-impl/tfs-impl-14/src/main/java/com/thoughtworks/go/tfssdk14/wrapper/GoTfsVersionControlClient.java
@@ -20,6 +20,7 @@ import com.microsoft.tfs.core.clients.versioncontrol.VersionControlClient;
import com.microsoft.tfs.core.clients.versioncontrol.soapextensions.Changeset;
import com.microsoft.tfs.core.clients.versioncontrol.soapextensions.RecursionType;
import com.microsoft.tfs.core.clients.versioncontrol.soapextensions.Workspace;
+import com.microsoft.tfs.core.clients.versioncontrol.WorkspaceLocation;
import com.microsoft.tfs.core.clients.versioncontrol.specs.version.ChangesetVersionSpec;
import com.microsoft.tfs.core.clients.versioncontrol.specs.version.LatestVersionSpec;
@@ -57,7 +58,7 @@ public class GoTfsVersionControlClient {
}
public GoTfsWorkspace createWorkspace(String workspace) {
- return new GoTfsWorkspace(client.createWorkspace(null, workspace, null, null, null, null));
+ return new GoTfsWorkspace(client.createWorkspace(null, workspace, null, WorkspaceLocation.SERVER, null, null));
}
public Changeset[] queryHistory(String projectPath, ChangesetVersionSpec uptoRevision, int revsToLoad) {
@arvindsv Thank you very much for this. I grabbed both files and will test the build out over the next couple of days.
@arvindsv I ran some tests today and the test build you provided seems to be working as expected.
Here is a summary of my test results:
My tests were performed on a new VM running Windows 2012 R2 using a snapshot to restore state between each test. Here is the process I followed:
Regarding the impact of using server vs local workspaces, I believe the only technical difference between the two is whether or not the client downloads repository metadata to enable offline work.
My understanding of the limitations of TFS workspaces is that:
To the best of my knowledge workspace location (server vs local) is not a factor in any limitation so any potential for conflict between multiple agents running concurrently will likely exist regardless of workspace location.
@jsmythsci Thank you for checking that.
You might be right. Your knowledge of server workspace location is probably more up to date anyway. :) I'll read about it some more and think about the code a bit more. If I don't see anything that can go wrong, I might make the change. In the worst case, I'll probably make this an option that can be turned on. Either way, you should get this change in the 19.1 release in January. I won't be able to analyze it in time for the current release (this week).
Umm. Also, in case you don't see me say anything by first week of January or so, please feel free to remind me about this.
Looking at some articles such as this and this, it looks like server workspaces were the default in TFS 2010. TFS 2012 introduced local workspaces and it seems to be, as you said, all about being able to work offline.
My suspicion is that we were using server workspaces before the upgrade to TFS SDK 14 (since there was no such thing as a local workspace before). If that's the case, I'm less worried about making this change to use server workspaces.
Some things you might have to check, @jsmythsci, when you find the time:
Ability to create a directory in the working directory.
Ability to change an existing locally checked out file (might not be allowed, since it's a server workspace and it's supposed to be read-only).
Making sure that a change in a file is seen by GoCD properly and the number of open workspaces as seen on the server (is there even such an ability?) doesn't keep increasing.
The last one: My thought is - a local workspace is like a full checkout. A server workspace is partial and is still "connected" to the server, in a sense. Does it cause extra load (mostly memory or filesystem) on the server?
Thank you for your continued attention to this, @arvindsv.
I can confirm that GoCD uses server workspaces prior to the TFS SDK update. There was a bug (#2840) whereby agents were silently failing to delete their workspaces and when I check our TFS server for workspaces owned by our GoCD user I can see that they are all server workspaces. This visibility helped me when investigating for this current issue because, if I checked quickly enough, I could see that the workspaces created on newer versions of GoCD were local rather than server workspaces.
Here are my thoughts on the points you raise:
Thank you again for your time and effort. Let me know your thoughts and if there are any tests you would like me to run.
@jsmythsci
Yes, please. A confirmation, either way, should be fine. I would expect it works, as well.
Ok.
That's right. This change is supposed to only happen on the agent side. So, it should be fine. What I was thinking about was probably similar to #2840. We could check this scenario: Agent creates a server workspace. New commit happens. Agent uses the same server workspace and doesn't create another. I'm not sure how the connection to the TFS server works. If it's just a matter of having some metadata around to say that this is a server workspace, that's fine. If it's more, then it might be worth it to check that this change doesn't keep anything open on the TFS server side. I understand that the behavior will revert to whatever was happening in earlier versions of GoCD - and so, should be the same. I'm just taking this chance to document some of that, if possible.
You're probably right about that.
I've opened https://github.com/gocd/gocd/pull/5608 which should allow me to make a simple change to put in a temporary toggle for this, controlled from the server side (assuming that change is accepted). Once the toggle is in place and works for a while, we can look for a more permanent solution (maybe by changing the TFS material config), unless we decide to default to server workspaces.
If you can check point (1) above, just for completeness' sake, that would be good. If possible, please check point (3) too. I understand it's more involved and if you cannot, that's ok. I'm asking only because you have seem to have easier access to a TFS server and probably know more about it than I do. :)
I feel that an agent should delete the server workspace at the end of a build. Not deleting would cause several server workspaces to be created (I believe there was a bug that we fixed regarding this.)
Also, this problem will just get worse with elastic agents, where a server workspaces will be created for every job that runs.
On Sat, Dec 29, 2018, 4:28 AM Aravind SV <notifications@github.com wrote:
@jsmythsci https://github.com/jsmythsci
1.
Yes, please. A confirmation, either way, should be fine. I would expect it works, as well. 2.
Ok. 3.
That's right. This change is supposed to only happen on the agent side. So, it should be fine. What I was thinking about was probably similar to #2840 https://github.com/gocd/gocd/issues/2840. We could check this scenario: Agent creates a server workspace. New commit happens. Agent uses the same server workspace and doesn't create another. I'm not sure how the connection to the TFS server works. If it's just a matter of having some metadata around to say that this is a server workspace, that's fine. If it's more, then it might be worth it to check that this change doesn't keep anything open on the TFS server side. I understand that the behavior will revert to whatever was happening in earlier versions of GoCD - and so, should be the same. I'm just taking this chance to document some of that, if possible. 4.
You're probably right about that.
I've opened #5608 https://github.com/gocd/gocd/pull/5608 which should allow me to make a simple change to put in a temporary toggle for this, controlled from the server side (assuming that change is accepted). Once the toggle is in place and works for a while, we can look for a more permanent solution (maybe by changing the TFS material config), unless we decide to default to server workspaces.
If you can check point (1) above, just for completeness' sake, that would be good. If possible, please check point (3) too. I understand it's more involved and if you cannot, that's ok. I'm asking only because you have seem to have easier access to a TFS server and probably know more about it than I do. :)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gocd/gocd/issues/5427#issuecomment-450440123, or mute the thread https://github.com/notifications/unsubscribe-auth/AAApZhVrDyHCiyeYFS-_EIxPVP5IWD06ks5u9qIugaJpZM4YsCQa .
Yes, we will need to understand what is happening first (and how a server workspace is "connected" to the TFS server - and what it leaves behind if it is not removed). Of course, removing it every time might also mean slower checkouts every time, etc.
@arvindsv
For point (1) I created a simple pipeline to test and confirmed that creating a directory works as expected.
For point (3) I'm not sure the scenario you describe is valid under normal operating conditions. As seen in the log snippet included in alexbestul's excellent post on #2840, GoCD unmaps TFS workspaces as soon as it is done getting the files from the server. In the example alexbestul provided, the expected lifetime of the TFS workspace was only about 0.4 seconds.
I think it is important to note that, in TFS parlance, a "workspace" is not a collection of files and folders on a user's machine but rather a mapping between a collection of files and folders on a client machine and that same collection on the version control server.
I also think the term "location" is misleading, given what a workspace is. I can think of the following questions relating to the location of a TFS workspace:
I agree with @ketan that TFS workspaces (whether local or server) should not persist between job runs; this is how GoCD works today and I see no reason to change that.
@jsmythsci I forgot to mention that the latest experimental installers (19.1.0-8395
) on https://www.gocd.org/download/ should have the changes for this. You'll need to set a system property as mentioned here: https://github.com/gocd/gocd/pull/5693
It'll also be released as a part of 19.1 and if all goes well, we can turn it on by default in 19.2 or 19.3 and remove the need for the system property soon after that.
Keeping this issue open for @jsmythsci to confirm if the fix helped solve the problem
Sorry to take so long to follow up on this. Our work on GoCD stalled while waiting for version 19.01 and other projects took priority.
I just tested upgrading from 16.11 to 19.01 and my test builds are working as intended. This was unexpected, however, as I did not set the following system property, which I believed would be required:
gocd.agent.extra.properties=toggle.agent.tfs.use.server.workspace.location=Y
Was this done intentionally?
Either way, I'm glad that it is working for us now. I am eager to get our system current so we can start leveraging the improvements the GoCD team has made over the past 2 years.
@jsmythsci That's indeed strange. I double-checked the logic just now and it should be defaulting to using null
for the server workspace location.
@arvindsv Thank you for confirming.
I must have set the system property ahead of time on our test system.
I confirmed that removing it resulted in the same failures I initially reported and that when I added it back to the server configuration the builds completed as expected.
Thank you again for the help and support working through this issue with us.
@jsmythsci You're welcome. :) I'll setup a reminder to change the false
to true
by default, in a couple of releases and then remove the toggle altogether a couple of releases after that.
Hey everyone. I just upgraded server and agents to 19.2 today and I'm still having this problem; when pointing to a single file for the material, the pipeline downloads a file named after the destination directory and breaks.
I see comments about needing to toggle a system problem, but I'm not quite sure what that means. I don't see anything like the above settings in the wrapper-server.conf file (I'm on Windows). Do I need to set an environment variable? What am I missing?
Thanks!
Hello @unkinected,
To get this working there are 2 things that you need to do:
gocd.agent.extra.properties=toggle.agent.tfs.use.server.workspace.location=Y
As documented here, the way to do this on Windows is to add a line like this to the wrapper-server.conf file:
wrapper.java.additional.XX=-Dgocd.agent.extra.properties=toggle.agent.tfs.use.server.workspace.location=Y
where XX represents the next wrapper.java.additional index.
Hope this helps.
Thanks. I tried all that... if I do NOT change the destination directory to a file name, I get the same problem before but at least it doesn't error out. When I try to change the dest directory to a filename with no folder path, it works. However, if I specify a folder path, I get this message:
Invalid directory name '.\folder\filename.ext' It should be a valid relative path.
Mostly there, but this will do for me for now. Thanks!
I think it might be a Windows vs Unix thing.
Try setting it to ./folder/filename.ext
(forward slashes instead of backslashes) to see if that helps.
FYI: I've opened #6283 to change the default value of this to use SERVER.
Issue Type
Summary
TFS checkout fails when material points to a single file
Environment
GoCD is running on a Linux server and the affected agents are running Windows Server 2012R2.
Basic environment details
18.10.0 (7703-42d1cbe661161b5400289ead86c0447c84af8c0a)
1.8.0_191
Linux 2.6.32-754.6.3.el6.x86_64
N/A
Additional Environment Details
This issue has been encountered on a copy of our production system that we created to test upgrading to the latest version of GoCD. The affected pipelines all work as intended on the following system:
15.2.0(2248-c1f90a6aa3192a)
1.8.0_181
Linux 2.6.32-754.6.3.el6.x86_64
Agents in the working environment are also running Windows Server 2012R2 but would be using an older version of Java since they are running an older version of the agent.
Steps to Reproduce
Expected Results
Pipeline should check the file out of TFS and launch its first task.
Actual Results
Pipeline fails with an error like the following:
Failed while checking out into Working Folder: pipelines\MyPipeline\tfs\AssemblyInfo.cs, Project Path: $/TeamProjectName/AssemblyInfo.cs, Workspace: bb18a58c6abfa36db84643de09258c86feaa48bc711174334eaea61594b9623f, Username: BuildUser, Domain: mydomain, Root Cause: java.io.FileNotFoundException: E:\GoCD\Go Agent\pipelines\MyPipeline\tfs\AssemblyInfo.cs\$tf\5\377de3f5-a6f8-41ed-970a-9b0bbf2c1a50.gz (The system cannot find the path specified)
Possible Fix
This appears to be an extension of #183 (TFS does not create destination directory if project path is a file). It seems that TFS is trying to create a local workspace and is failing because:
If TFS were to create a server workspace instead of a local one (as it does in 15.2), this issue would likely not exist.
Log snippets
Here is the stack trace from go-agent.log when the exception is thrown:
Code snippets/Screenshots
Here is the full XML of the pipeline used to test this issue:
Any other info