dpp / lawyersongithub

A Telegram site for the people who are lawyers and also have GitHub accounts
114 stars 119 forks source link

Make data more machine-readable #75

Closed benbalter closed 4 years ago

benbalter commented 9 years ago

Rather than storing data as a free-form text file, this pull request stores the data in a more machine readable format. After all, we're not just lawyers, we're lawyers on GitHub.

Chatting with @adelevie, we saw an opportunity to:

  1. Make maintenance easier, by eliminating the risk of merge conflicts inherent in each new lawyer editing the same (last) line
  2. Improve the presentation, by knowing a bit more about each lawyer (e.g. avatars)

Specifically:

In order to do this, I wrote a small script, script/convert, which parses the existing (live) markdown file, and converts each lawyer to a document within the Jekyll collection. You can see the results here.

I'd still need to document things, and clean up the output (and perhaps add a small API), but before I went to far, I wanted to check in, surface my work, and get early feedback before deciding to proceed.

Here's what it looks like now:

screen shot 2015-05-21 at 1 09 07 pm

Thoughts?

dpp commented 9 years ago

@benbalter I really appreciate the effort you put forth on this!

One of the reasons for launching this site, other then the value of demonstrating lawyers can be tech geeks, is to use https://telegr.am Lemme see how much of the work you did can be converted into Telegram-isms. I'll work on it over the weekend.

Thanks!

benbalter commented 9 years ago

to use https://telegr.am

Gotcha. I wasn't aware of that. How strongly do you feel that the project should remain on https://telegr.am?

Obviously you're the BDFL, and more than glad to go into more detail and make the case for why I was motivated to open the pull request in the first place. As the project continues to grow, I think there's value in (A) the ability to publish on a dynamic platform (to do things like bring in Avatars, provide APIs, etc. — unless Telegram can do that?), and (B) to expose lawyers, as part of the process of being added, to more of the GItHub ecosystem (GitHub Pages, Jekyll, machine-readable formats like YAML, etc.).

To make it easier to preview the changes, I moved my personal fork over to a dummy org (lawyers-on) so that you can preview it at http://lawyers-on.github.io/ to get a better sense of the proposed changes. Again, It's rough, I stopped as soon as I had something viable to check in and start this very conversation with the broader community.

dpp commented 9 years ago

Gotcha. I wasn't aware of that. How strongly do you feel that the project should remain on https://telegr.am?

Pretty strongly. I built Telegram for a wide variety of reasons, not the least of which is that I'm super not keen on the whole Jekyll approach to the world. A discussion of why is probably best done in person... but suffice it to say, I left the Ruby/Rails world, and promoted Scala and founded http://liftweb.net because of my reaction to the Ruby/Rails world.

I totally agree with the idea of machine readable data as the basis for HTML presentation. YAML isn't my favorite (JSON is), but data in easily editable formats that are also machine consumable are GoodThings:tm:.

I understand your motivation for making this a more GitHub-centric thing. :grin: Something I have to work on internally is the value of all-GitHub vs. "my baby, my technology".

Also, please read the above in the light of an internal struggle by the person at the other end of the ticket... a person who was an early GitHub user/promoter, a person who appreciates how excellently you've approached this conversation... but nonetheless, a person who has to do a bunch of introspection to get to the right place.

Thanks!

adelevie commented 9 years ago

@dpp FWIW, as both a lawyer on github who is on lawyersongithub.com, and as someone who has done most, if not all of the merging lately, I really prefer a Jekyll approach.

You had the foresight to create the site, buy the domain, and set up the hosting, and none of that is lost on me here. That said, I'd like to think of this as a project that is responsive to community needs and desires.

samglover commented 9 years ago

Total outsider here, but what's the value of telegr.am? This is the first I've heard of it, so I'm asking out of ignorance and not trying to make @dpp feel bad.

By the way, whatever telegr.am's virtues, it looks like lawyersongithub.com is several commits out of date.

kemitchell commented 9 years ago

@adelevie and @samglover, to the extent the project needs another hand merging in new PRs, I'm happy to help. I merge several PRs a day for other projects, so it's not much trouble.

I'd just want to be sure I follow the same criteria that have been used so far. I'm not sure those are made clear in the repo.

dpp commented 9 years ago

I spent a few hours looking at the PR and reviewing the overall goals of both the site and the PR. Here's where I've come down and the reasons.

In terms of splitting the data out from the rendering and putting each lawyer's information in a separate .md file with YAML data and free-form Markdown content, I think that's a great idea. It addresses a lot of different issues including the issue of isolating each lawyer's data. It's likely that we will need to figure out some mechanism for ordering the entries (perhaps by date that each individual adds themselves to the site).

So, to the extent that we are in agreement about the what, awesome.

The next issue is about the how. The how issue is a much thornier one because it involves a combination of corporate interests, history, contribution to the project, and technology quality. I will address each of these issues... but not in order.

History

I founded the "project" (that's probably a strong word for something with a couple of hundred lines of text) half as a joke and half as a response to some of my friends at Redmonk. I got behind in some of the pull requests and @adelevie offered to help out. Alan's be super-helpful and super awesome. However, this PR is the first PR that's suggested a change in the structure of a one-file project. Nobody has suggested that we add CSS or do anything else to the project.

Hosting for the project has been solid (5 9's of uptime over at https://telegr.am... which hosts a fair number of static sites... many from the Lift world... including the main Lift site).

But from a historical perspective, there's not really any "we've got to make this better because the tech choices are bad." To my mind this dismisses both Alan and @samglover's arguments about the community and about not knowing what Telegram is as a hosting entity.

As a side note, I've had a very long history of supporting GitHub and I was probably the first lawyer on GitHub... but that's not really relevant to the discussion.

Contributions to the Project

Alan has made a nice contribution to the project. In hours spend, he may even exceed my hours spent... well up until I spend 5 hours today understanding what Ben did and doing similar things using Telegram's features. And Ben has also made a substantial contribution in the form this pull request. I am not dismissing either Ben's contribution or Alan's suggestion about "the community speaking." On the other hand, the aggregate efforts by all parties (including me) to date have been trivial. There's no compelling contributions that anyone has made that suggest to me that the project should fundamentally change beyond the splitting out stuff that Ben has done.

Corporate Interests

Just as Ben has a corporate interest in raising awareness about GitHub, I have an interest in raising awareness about Telegram. No, Telegram is not as popular and well know as GitHub, but nonetheless, I have an interest in promoting something that I've spent a lot of time working on. Absent a compelling reason to give up lawyersofgithub.com to GitHub, I'm going to stick to my self interests.

Also, I understand the interest on the part of GitHub to demonstrate a complete solution to non-self-identified tech people. But the main thing that lawyers are going to be interacting with related to lawyersongithub.com is the creation of the pull request. The fact that the site is rendered elsewhere was invisible to some of the folks that commented on the PR. This is a long way of saying that I reject Ben's argument that lawyersongithub.com should be a 100% GitHub gig.

Technology Quality

Up to this point, I've come down moderately on the side of continuing to host the site on Telegram. I could have been persuaded that Jekyll/GitHubPages/etc. offered a better solution than Telegram. But in the tech area, there's nothing that compels me that a Jekyll solution is better.

I built a similar site off the data Ben split out. The code can be found at https://github.com/dragonmark/lawyersongithub/blame/master/index.md#L3

Here's the code to read and parse all the Markdown files:

<script data-server-js="true">                                                  

function people() {                                                             

// get the list of MD files                                                     
var f = currentFile.value().fileInfo().file().openOrThrowException("js");       
f = new java.io.File(f.getParentFile(), "_lawyers");                            

var folks = f.listFiles().                                                      
  filter(function(tf)                                                           
    {return tf.isFile() && tf.getName().endsWith(".md");}).                     
  map(function(tf)                                                              
    {return fileParser.load(tf);}).                                             
    filter(function(x) {return null != x;}).                                    
  map(function(v) {                                                             
  return {html: v.html(),                                                       
          data: help.toJs(v.meta())};                                           
    });                                                                         

return folks;                                                                   
}                                                                               

</script>  

It's pretty easy to read JavaScript. Rendering it into the template is done using Lift's CSS Selector Transforms. The template:

<div data-js="people()">                                                        
<hr>                                                                            
<span data-js="* *+ #> it.data.name">Name: </span>                              
<a href="https://github.com/" data-js="* [href+] #>it.data.github"><span data-j\
s="* *+ #> it.data.github">GitHub: </span></a>                                  
<span data-js="* #> it.html">Name: </span>                                      

<ul data-js="it.data.links">                                                    
  <li data-js="a * #> it[0]"><a href="#" data-js="a [href] #> it[1]"></a></li>  
</ul>                                                                           

</div> 

To my eye (coming from Lift/CSS Selector Transform-land) the code is more readable... and the code is already correctly HTML escaped... so folks putting an & or some other character in their YAML will not have to worry about it messing up the site.

The code and the site can be seen at:

Conclusion

I appreciate the effort that Ben put into the idea of splitting out the lawyers into separate data files. I think it's an awesome idea. I appreciate the ongoing work that Alan has put in.

I hope we can all work together on the Telegram-powered lawyersongithub.com together.

kemitchell commented 9 years ago

It's difficult to follow. I'd bet most here have done plenty of GitHub, mailing lists, and even Usenet, and know how easy it is so slide from this point down. Easy, but not inevitable!

I'm writing anyway because, frankly, how or where to publish the webpage isn't where I thought this would go. Until @dpp's comment---and despite watching the repository---I hadn't any idea the list was republished. "Telegram" in the description rang no bell. And so I expected discussion with @benbalter about whether folks are comfortable lending their info to a webpage and seeing info about them so thoroughly structured. Who's a lawyer and where they lurk online are eminently Google-able facts. But offering those facts up on a plate for republication is a far cry from twiddling plaintext deep within Club Nerd.

In practical terms, landing new PRs seems like both the point and the problem. There is a bit of a backlog. Dealing with merge conflicts is always annoying; I know @adelevie's pain. (Check out merge=union!) On the other hand, I fear "mo' technology", like separate files for each new listing, would also make it harder for new GitHubbers to add their information. Judging by the PR titles, many are using the web editor, most popular among beginners. If we insisted folks kinda-sorta figure out Jekyll and YAML, too---or even just find their way through a mess of files---I'd expect many fewer PRs long-term.

So here is a concrete proposal. Add me to the repo. I will:

  1. rebase-merge or respond to all the PRs currently open
  2. add http://lawyersongithub.com as the repository's URL, next to its description
  3. mention the URL in the README
  4. mention that folks should add their info to index.md, and link to GitHub's tutorials on the web editor

Discussion about website can continue in parallel.