Open davidskalinder opened 3 years ago
Hmm so I remembered that in addition to being an ETL file this file, at least in theory, could one day serve as a way to review coder progress in the manner of #58, which is presumably why those technically unnecessary fields are in there in the first place.
So I was going to go the fascist route and take out everything except db_id
, but for that other use case it's probably better to go the other (hippie?) route and include everything except text
.
The ETL can always ignore the rest of the fields and get them from the article_metadata
export (which, @johnklemke, I recommend you do in case we decide to get rid of them later).
All right, d5b9ffecfa5 removes everything except db_id
(in case we want to go back to this later), and then 0481ebc8541ff adds everything except text
back in again.
Hmm so I should update the PR for this huh...
Uh whoops:
So I guess for now I just need to deploy this change to our live instance.
- Somehow I made these last two commits as @alexhanna. I'm guessing that updating Ubuntu has done something strange to the way git handles user credentials? So I'll look into that. But for these commits, I don't know what to do other than to remember that they weren't actually @alexhanna...
I think as with #116 the best thing to do to maintain order in the universe will be to simply recreate this branch by checking out the last commit by real-me and then pasting in the subsequent changes from the soon-to-be-abandoned branch. Then (I think) we can delete the abandoned branch and (if GH will let us) rename the new branch over the old one? Or, worst-case, name it something else...
All right, d5b9ffe removes everything except
db_id
(in case we want to go back to this later), and then 0481ebc adds everything excepttext
back in again.
Okay, rebuilt these with my own bare hands as 1bb36aa0bf0 and c27c87bf596 respectively and pushed. by_coder_and_event_by_annotation
should now look like a perfectly ordinary git branch that nothing weird ever happened to.
All right, so I think this is done except for deployment (the PR can be tracked in #119). So I'm going to move this to the done column.
Meanwhile if anybody has strong opinions about the inclusion or exclusion of these fields, please feel free to pipe up.
Hah apparently this exporter has been throwing in the entire
article_metadata
record for each event, so the output now includes the entire article text along with every event. So yeah I probably ought to be a little more selective about which article metadata fields I include.@johnklemke, would you be able to tell me which of the fields below are expected in this file (i.e., the one with coder-events in rows and annotation variables in columns) by your side of the ETL process? Technically they're probably all redundant except for
db_id
, since they will all appear in the separate export that only contains one line per article (the one discussed in #110). But I don't want to cut something you're relying on.Of course I could put in everything other than text, but it seems like it's probably better to be a little stricter with the output? So if you can let me know which fields you need I'll leave those in and cut the rest.