WilDoane / GitDataCollection

2 stars 0 forks source link

We should capture the working directory on-compile #12

Open briandk opened 12 years ago

briandk commented 12 years ago

The Suggestion

I'm thinking we could add a line, something to the effect of

echo pwd >> tempfile

The idea being that with that addition, a git commit's extended summary could be something like:

In file included from hash.c0:
hash.h:20 warning: parameter names (without types) in function declaration [enabled by default]
In file included from parse.c0:
hash.h:20 warning: parameter names (without types) in function declaration [enabled by default]

Working Directory:
/current/path/or/whatever // result from echo-ing pwd

The Reason

As I check out commits and try to reconstruct students' code (and its working state), I realize a call like gcc start.c---even though that information is in the git commit---doesn't tell me where they called gcc from. Were they trying to compile project 1, or project 2? It's especially a problem if multiple projects have the same filenames within them, like main.c for example. So, I think this level of disambiguation could be important, and low-overhead.

WilDoane commented 12 years ago

We could do this.

What's the use case where the student would compile from some place other than the directory holding the source code? (there's a trivial answer to this for larger projects where complex or library code might be in subdirectories). But for these students, when might they do that?

If the student created 2 directories (eg project1/ and project2/), then invoking

../OutputCommits.sh project1/project.c

And

../OutputCommits.sh project2/project.c

Would generate two distinct HTML files in the gitdatacollection folder.

-Wil

Sent from my mobile device. Please forgive any terseness, typos, or malapropisms that may well be an artifact of the limits of this technology.

On Jun 28, 2012, at 11:10, "Brian A. Danielak"reply@reply.github.com wrote:

The Suggestion

I'm thinking we could add a line, something to the effect of

echo pwd >> tempfile

The idea being that with that addition, a git commit's extended summary could be something like:

In file included from hash.c0:
hash.h:20 warning: parameter names (without types) in function declaration [enabled by default]
In file included from parse.c0:
hash.h:20 warning: parameter names (without types) in function declaration [enabled by default]

Working Directory:
/current/path/or/whatever // result from echo-ing pwd

The Reason

As I check out commits and try to reconstruct students' code (and its working state), I realize a call like gcc start.c---even though that information is in the git commit---doesn't tell me where they called gcc from. Were they trying to compile project 1, or project 2? It's especially a problem if multiple projects have the same filenames within them, like main.c for example. So, I think this level of disambiguation could be important, and low-overhead.


Reply to this email directly or view it on GitHub: https://github.com/WilDoane/GitDataCollection/issues/12

briandk commented 12 years ago

I can't argue that there are clear-cut use cases that we've already experienced. Rather, I have to argue that:

  1. The case outlined below is plausible, and
  2. The additional working directory information is trivial to capture, trivial to store, and helps us be more efficient in the long run

A hypothetical case

A student has multiple projects, each with its own subfolder. By convention, each project has a main.c file, and possibly other modules.

Looking through a pile of commits that all have the same gcc call doesn't make it immediately obvious which project the student might have been compiling. Moreover, from Isaac's data we know at least one student would go back to work on old projects even after they were submitted, so we can't depend on just timestamps to tell us which project (1 or 2) was being compiled.

In short, if I want to recompile code from a given commit I often have to do extra work to make sure I'm in the same subdirectory the student was in when they compiled.

An argument for efficiency

Let's face it, for long-history files the output commits tool takes some time. Even if it were a constant-time script, and even if k were small, running output-commits is still an extra step in the workflow.

Now consider: with shell scripting, it should be trivial for me to splice the already-captured gcc call with the current working directory (adjusting for the directory structure on GLUE), creating a command I can paste as-is into my terminal to execute their exact compile call.

So, imagine what I want to do us check out a commit, compile the students' code, and examine its runtime behavior. With my proposed solution, which includes capturing the working directory, adjusting relative paths for GLUE, splicing to reconstruct the gcc call, and appending that to the extended description, here would be my workflow for reconstructing run-time behavior:

  1. Identify the commit of interest
  2. Check it out from the root level of my local repo for that student
  3. Copy paste the spliced gcc call from theol description
  4. Press enter

To my knowledge:

  1. Our current setup doesn't make it as simple as I've outlined above,
  2. For my dissertation analysis I'm finding I need to do this often: identify, check out, compile as the student saw it.
briandk commented 12 years ago

The git rev-parse manpage has some helpful commands: http://www.kernel.org/pub/software/scm/git/docs/git-rev-parse.html

I think I found the one we'd need to do this: git rev-parse --show-prefix

Creating a Fake Project

Here's an example. First, I'll create a FakeProject on my desktop:

mkdir FakeProject
cd FakeProject
git init
mkdir foo
cd foo
mkdir bar
cd bar
touch fakefile.txt
echo "hi" >> fakefile.txt
git add *
git commit -m "Initial Commit"

Getting Full and Relative Paths

Now, we can look at both the full-path to our current location (inside the bar folder) and the path relative to the top-level repo. The results of the shell commands are shown as comments below:

pwd                         // /Users/briandanielak/Desktop/FakeProject/foo/bar
git rev-parse --show-prefix // foo/bar

My proposed solution sketch

In order to add a line like this to the extended description:

gcc foo/bar/sourceFile.c

I just have to figure out a way to paste together the respective pieces:

And the only real trick there is that--at least in the tcsh version of our script--I'm not sure whether we're always guaranteed that the ith argument in $argv will be the filename. I don't have K&R handy, but it's possible that the answer to my question is "yes," and that the filename is always the first argument following "gcc" itself.

WilDoane commented 12 years ago

It's a little worse than that, I think. Suppose the user compiles with something unusual like

cd project gcc week1/board.c week2/tests.c

you don't want to grab the compile-time-path and take it on the just the first argument:

gcc project/week1/board.c week2/tests.c

Instead, you want to generate a commit message such as

cd project gcc week1/board.c week2/tests.c

That is, you want to generate a commit message that would allow you to recreate the state of the system at compile-time.

-Wil

William Doane http://DrDoane.com

On 2012 Jul 10, at 21:49, Brian A. Danielak wrote:

The git rev-parse manpage has some helpful commands: http://www.kernel.org/pub/software/scm/git/docs/git-rev-parse.html

I think I found the one we'd need to do this: git rev-parse --show-prefix

Creating a Fake Project

Here's an example. First, I'll create a FakeProject on my desktop:

mkdir FakeProject
cd FakeProject
git init
mkdir foo
cd foo
mkdir bar
cd bar
touch fakefile.txt
echo "hi" >> fakefile.txt
git add *
git commit -m "Initial Commit"

Getting Full and Relative Paths

Now, we can look at both the full-path to our current location (inside the bar folder) and the path relative to the top-level repo. The results of the shell commands are shown as comments below:

pwd                         // /Users/briandanielak/Desktop/FakeProject/foo/bar
git rev-parse --show-prefix // foo/bar

My proposed solution sketch

In order to add a line like this to the extended description:

gcc foo/bar/sourceFile.c

I just have to figure out a way to paste together the respective pieces:

  • The literal "gcc"
  • The output of git rev-parse --show-prefix ("foo/bar")
  • The filename argument from the gcc call itself (which I think we're already capturing as a shell argument)

And the only real trick there is that--at least in the tcsh version of our script--I'm not sure whether we're always guaranteed that the ith argument in $argv will be the filename. I don't have K&R handy, but it's possible that the answer to my question is "yes," and that the filename is always the first argument following "gcc" itself.


Reply to this email directly or view it on GitHub: https://github.com/WilDoane/GitDataCollection/issues/12#issuecomment-6896014

WilDoane commented 12 years ago

Check out https://github.com/WilDoane/GitDataCollection/pull/14

-Wil