18F / epa-notice

Web interface for viewing and commenting on proposed changes to federal regulations
Other
7 stars 17 forks source link

Human readable file name for downloaded PDFs #299

Closed jehlers closed 8 years ago

jehlers commented 8 years ago

Right now when I download a draft document, the file is named comment.pdf. I'm not sure what happens when its the final comment. Can/should we do the following:

Draft:

Final:

cmc333333 commented 8 years ago

From chat:

@jehlers

I just added this issue: Anyone else have better ideas? ​and should it be added to this sprint?

@cmc333333

we won't know the tracking number at pdf generation time, so if we can avoid it, we skip a step can we use the title of the proposal (say 15 chars of it)?

@jehlers

That’s why I was thinking eManifest...

@tadhg-ohiggins

well, the title for this rule is "Hazardous Waste Management System; ..."[trimmed by ed] So first 15 would be hazardous waste

@cmc333333

PERFECT, roll it. what I'd like to avoid is having a configuration per notice, if that makes sense we've got a few pieces of meta data (the FR id, the docket id, the title, the agency, the url) that we could use automatically probably more

@tadhg-ohiggins

Right. Yes, the only thing I'd add to that is possibly a timestamp for the comment, so that multiple comments by the same user don't have the same filename.

vrajmohan commented 8 years ago
  1. Are we sure we want the 1st 15 characters of the title? We will end up with "DRAFT_comment_hazardous waste.pdf" and "Comment_hazardous waste.pdf". For the 1st 3 Proposed Rules in today's Federal Register, we would end up with:

    • "Retrospective R" for Retrospective Review-Improving the Previous Participation Reviews of Prospective Multifamily Housing and Healthcare Programs Participants; Supplemental Notice of Proposed Rulemaking
    • "Safety Zone; An" for Safety Zone; Annual Roy Webster Cross-Channel Swim, Columbia River, Hood River, OR
    • "Mandatory Depos" for Mandatory Deposit of Electronic Books and Sound Recordings Available Only Online

    My recommendation is to use the FR id, which is compact, or the docket id.

  2. Timestamps are ugly and, unless we use ISO 9601 or similar non-American formats, not sortable. Shouldn't we leave it to the browsers? In my experience, browsers already handle multiple downloads with the same name by appending a version number - e.g. comment (3).pdf.
cmc333333 commented 8 years ago

Are we sure we want the 1st 15 characters of the title?

I, for one, am not. I like the cleanliness of using one of the ids, but they are meaningless outside of the FR and regs.gov. When FR docs are cited, they use volume + page. Maybe we could use the agency name? Maybe the publish date of the proposal? We also have access to the FR citation if it'd be helpful.

Timestamps are ugly ... Shouldn't we leave it to the browsers?

I agree. The one caveat here is that if we use a very generic doc title (like the agency name alone), the browser-provided file name might not be enough to be distinguishing. To put it differently, I think it'd be fine to use one filename per proposal; we should try to avoid the same filename being reused by different proposals, however.

jehlers commented 8 years ago

I like the FR citation... it definitely adds some specificity that's easier to plan for than the title of the rule. Not great for the general public, but good for all the lawyers who will probably be the first users on this tool.

Docket ID would be my second pick probably, only due to length and that it has less meaning off the bat. At least a citation has a use in that I could more easily look up the rule. What do you all think?

Third pick would be the agency acronym. Then we would have "comment_EPA.pdf" or "DRAFT_comment_EPA.pdf" which would work well for us now, but might have issues in the future if EPA has 2 rules up and open for comment.