Deciding *what* to authenticate

vdavez commented 10 years ago

Originally, my thinking was that Cranch would authenticate an XML file for each section of the Code, not an HTML rendering of the Code. It seemed to me that the advantages are:

You could make changes to how the Code is rendered (better features, navigational links, etc), without requiring additional authentication;
It would obviate possibility that actual presentation of the Code becomes passe over time (e.g., <blink>)
It allows for richer tagging, if jurisdictions so choose, separate from the presentation.
Taken literally, we might have to preserve each rendering of the HTML in order to comply with the preservation requirements of UELMA.

BUT, an advantage of authenticating the HTML is that a user actually interacts with HTML, not with XML. And ultimately, presentation is sometimes intrinsic to the actual law (esp. forms, maps, images, etc.).

Ultimately, we need a decision. But it would be helpful to understand the pros and cons of each option.

vdavez commented 10 years ago

Am cc'ing @KenHirsh to explicitly solicit your view...

KenHirsh commented 10 years ago

Hi Dave, I think ideally one would authenticate both an XML and an HTML version. Of course the negative side of doing so is the additional cost, at least in labor. The reason to authenticate HTML would be to have an authenticated version available to the casual user, typically a member of the public or an attorney who wants quick access to a current and official version of the code. The reason to provide the XML version is to allow others to manipulate the text and produce alternative products from the authenticated code; e.g., online annotated codes, indexes that are more granular than an official index might be, incorporation into 50-state surveys, among others.

If one has to choose between the two, then I would come down on the side of XML, because of its benefit to a wider audience, and others might well be able to use the XML version to cheaply or freely serve those who rely on an HTML version. I hope this is helpful, and thanks for asking. Ken

Kenneth J. Hirsh Director of the Law Library and I.T. Professor of Practice University of Cincinnati College of Law ken.hirsh@uc.edumailto:ken.hirsh@uc.edu (513) 556-0159

From: vzvenyach [mailto:notifications@github.com] Sent: Thursday, February 6, 2014 4:41 PM To: DCCouncil/Cranch Cc: Hirsh, Kenneth (hirshkh) Subject: Re: [Cranch] Deciding what to authenticate (#4)

Am cc'ing @KenHirshhttps://github.com/KenHirsh to explicitly solicit your view...

— Reply to this email directly or view it on GitHubhttps://github.com/DCCouncil/Cranch/issues/4#issuecomment-34375309.

vdavez commented 10 years ago

Very helpful feedback! The more I think about this, the more I'm inclined to think that we should authenticate both the XML and HTML, but do them differently. Here's my thinking:

Authentication, as we know, means the ability to reliably determine the publisher of a record through technological means and to ensure that the record is unaltered from the official published record implicitly authenticate. We also know that there are multiple ways to electronically authenticate documents, the question is one of use case...

So, if we use HTTPS to publish the HTML, then it is "authenticated" within the meaning of UELMA. The limitation, of course, is that a user couldn't pass around a downloaded file and verify it at a later point if it isn't "signed" or hashed or whatever. But downloading HTML is not how people use HTML in real life anyhow. In real life, they pass around links, print to paper, print to PDF, etc.

If a user wanted to download and transfer a file, or as you said, "to allow others to manipulate the text and produce alternative products from the authenticated code", they could use the XML representation.

Conclusion: authenticate both HTML and XML, but handle them differently.

KenHirsh commented 10 years ago

David, Your statement about using HTTPS raises one question: how is using HTTPS authenticating both that the document is official and unaltered? The certificate verifies that the site is operated by the D.C. Council, and the use of SSL/TLS assures me that the document has not been corrupted in transmission, but what feature is certifying that it is unaltered? Thanks. Ken

Kenneth J. Hirsh Director of the Law Library and I.T. Professor of Practice University of Cincinnati College of Law ken.hirsh@uc.edumailto:ken.hirsh@uc.edu (513) 556-0159

From: vzvenyach [mailto:notifications@github.com] Sent: Sunday, February 9, 2014 2:13 PM To: DCCouncil/Cranch Cc: Hirsh, Kenneth (hirshkh) Subject: Re: [Cranch] Deciding what to authenticate (#4)

Very helpful feedback! The more I think about this, the more I'm inclined to think that we should authenticate both the XML and HTML, but do them differently. Here's my thinking:

Authentication, as we know, means the ability to reliably determine the publisher of a record through technological means and to ensure that the record is unaltered from the official published record implicitly authenticatehttps://github.com/DCCouncil/Cranch/blob/master/documentation/authenticity.md. We also know that there are multiple ways to electronically authenticate documents, the question is one of use case...

So, if we use HTTPS to publish the HTML, then it is "authenticated" within the meaning of UELMA. The limitation, of course, is that a user couldn't pass around a downloaded file and verify it at a later point if it isn't "signed" or hashed or whatever. But downloading HTML is not how people use HTML in real life anyhow. In real life, they pass around links, print to paper, print to PDF, etc.

If a user wanted to download and transfer a file, or as you said, "to allow others to manipulate the text and produce alternative products from the authenticated code", they could use the XML representation.

Conclusion: authenticate both HTML and XML, but handle them differently.

— Reply to this email directly or view it on GitHubhttps://github.com/DCCouncil/Cranch/issues/4#issuecomment-34582970.

vdavez commented 10 years ago

I am definitely glad I included you on the thread, because maybe I misunderstand something...

If "the use of SSL/TLS assures me that the document has not been corrupted in transmission," isn't that the same as saying that the data has not been altered since publication by the state?

(Granted, you couldn't subsequently verify the document using this method; you could only do it at the time of viewing...)

KenHirsh commented 10 years ago

I believe we would need some positive assertion/declaration that the text as it appears on the page is the official text. Perhaps a statement on the page to that effect would be sufficient, but it would have to appear on every page. Alternatively, we could go back to providing an accompanying hash file, but we’ve discussed earlier the extra work that creates for both the producer and the consumer. I wonder whether it is possible to have the SSL certificate itself carry information that would affirm the official and unaltered status of the text. Ken

Kenneth J. Hirsh Director of the Law Library and I.T. Professor of Practice University of Cincinnati College of Law ken.hirsh@uc.edumailto:ken.hirsh@uc.edu (513) 556-0159

From: vzvenyach [mailto:notifications@github.com] Sent: Monday, February 10, 2014 9:26 AM To: DCCouncil/Cranch Cc: Hirsh, Kenneth (hirshkh) Subject: Re: [Cranch] Deciding what to authenticate (#4)

I am definitely glad I included you on the thread, because maybe I misunderstand something...

If "the use of SSL/TLS assures me that the document has not been corrupted in transmission," isn't that the same as saying that the data has not been altered since publication by the state?

(Granted, you couldn't subsequently verify the document using this method; you could only do it at the time of viewing...)

— Reply to this email directly or view it on GitHubhttps://github.com/DCCouncil/Cranch/issues/4#issuecomment-34636274.

konklone commented 10 years ago

It feels like SSL would implicitly validate the official and unaltered nature of the text, when retrieved directly from the site (instead of opened as an external file, e.g. email attachment).

Official - This information is carried in the certificate itself. The certificate's existence and validation proves that the user is downloading the file from the government domain name it says it is. The cert at https://dccouncil.us currently says it's the "Government of the District of Columbia", for example (though I'm not sure this metadata is all that important - the domain name is what counts).
Unaltered - TCP ensures that the file has not been corrupted in transit, and SSL ensures that there is no middle-man that might provide a false version of the file between the user and the domain name they are attempting to reach (as long as the certificate is valid).

It's not necessarily the case that every resource available via https:// on dccouncil.us is something the Council is declaring as legally official and binding in Court. Some pages are allowed to have typos without legal repercussions.

So a declaration that these pages are legally official is still probably needed, but I don't see why it has to appear on the page in question, or in the certificate at download-time. It can be declared on some other page (ideally itself covered under https://) that the following part of the site is legally official, or something like that. This is especially true because what @vzvenyach is proposing, for HTML renditions, is to authenticate URLs, rather than files. So what you need somewhere is a clear description of which URLs are being granted official status -- then, SSL provides the mechanism to enforce it.

I'm extremely glad this is being explored, because the challenges of authenticating URLs are much lower than authenticating files, because the world needs secure URLs in order to perform basic functions now. Maybe it's not a replacement for authenticating files, but recognizing that HTML authentication can piggy-back on globally deployed authentication infrastructure is an A+ idea.

JoshData commented 10 years ago

I think @KenHirsh's point could be restated as: In the use case where a user has merely received a link, can the user trust that the domain owner actively prevents any documents from landing on the domain that look official but aren't? That might either be because the domain allows user-uploaded content (so there may be malicious or inadvertent official-looking documents) or the domain contains official-looking but not actually official UELMA documents like drafts or unofficial legal docs.

If the user is savvy, they might recognize that dccouncil.us/official/code is official and dccouncil.us/uploaded_media/QX1234.pdf could be junk.

To put additional information into the certificate and have it be displayable to the user, I think it would have to be the owner name in an extended validation certificate, which is the sort of cert where you see the company name in the address bar. E.g. Dave would get a certificate for "DC Council Official Documents". I'm not sure if CAs would permit that though, unless that was the name of some DC agency or other entity.

Alternatively, official docs could be hosted on a domain like "officialpub.dccouncil.us". (I actually like this.)

In either case, users would have to be educated about what to look for and what to be aware is not official (just as they have to be educated about how PDF signatures work).

We could also develop a new standard for indicating the provenance of docs on the web, and then create web browser extensions to make that information visible to the user. E.g. A file at the root of the domain would indicate that docs in a certain directory are UELMA-official.

vdavez commented 10 years ago

A lot to mull over.... But one quick update that I think is relevant based on the discussion: I am working on getting dccode dot gov.

konklone commented 10 years ago

Well said, @JoshData, I totally get that. And for what it's worth, https://dccouncil.us already has an Extended Validation certificate from Verisign.

official.dccode.gov would be an awfully nice domain.

KenHirsh commented 10 years ago

All, I think I can agree with the principles you are putting forth on HTML authentication. I think the following processes would be necessary to make it fly under the act:

 You establish an FQDN that is expressly reserved for publishing the official AND authenticated code. Nothing else is to be contained within that domain.

 You obtain an extended SSL certificate for the FQDN in the name of the D.C. government, or in the name of the office within the government that is the official publisher of the code.

 The official publisher sets forth in the Municipal Register the methods it uses for authentication and that all code provisions that are available in the FQDN are declared official and authentic when the receiving browser indicates that the SSL certificate details match the certificate. The regulations should expressly list those certificate details.

Is this making sense? Ken

Ken Hirsh ken.hirsh@uc.edu

From: Eric Mill [mailto:notifications@github.com] Sent: Thursday, February 13, 2014 8:06 PM To: DCCouncil/Cranch Cc: Hirsh, Kenneth (hirshkh) Subject: Re: [Cranch] Deciding what to authenticate (#4)

Well said, @JoshDatahttps://github.com/JoshData, I totally get that. And for what it's worth, https://dccouncil.ushttps://dccouncil.us/ already has an Extended Validation certificate from Verisign.

official.dccode.gov would be an awfully nice domain.

— Reply to this email directly or view it on GitHubhttps://github.com/DCCouncil/Cranch/issues/4#issuecomment-35045838.

konklone commented 10 years ago

This makes lots of sense, I just wonder if number 3 (and maybe number 1) is placing a higher burden on meeting UELMA requirements than if this were about sending PDFs around. Do legal bodies need to issue regulations stating the name of the legal entity that people should expect to see on the seal of an authenticated PDF when they open it in Adobe Reader? Do the regulations even say anything about PDFs or Adobe Reader? It seems like it's enough for them to just describe it somewhere on their website.

DCCouncil / Cranch

Deciding what to authenticate #4

DCCouncil / Cranch

Deciding *what* to authenticate #4

Deciding what to authenticate #4