datamade / nyc-council-councilmatic

NYC Council version of Councilmatic
MIT License
7 stars 3 forks source link

Replace rtf conversion script with actual PDFs #135

Open reginafcompton opened 6 years ago

reginafcompton commented 6 years ago

The rtf conversion script for NYC sometimes requires longer than 15 minutes to complete (which delays NYC data imports).

Let's replace the RTF --> HTML with the actual PDFs. It should be possible via this PR.

fgregg commented 6 years ago

pdfs are not a substitute for the rft conversion, because the html is a much better web experience.

On Mon, Mar 19, 2018 at 2:23 PM, Regina Compton notifications@github.com wrote:

The rtf conversion script for NYC sometimes requires longer than 15 minutes to complete (which delays NYC data imports).

Let's replace the RTF --> HTML with the actual PDFs. It should be possible via this PR https://github.com/opencivicdata/python-legistar-scraper/pull/64.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/datamade/nyc-council-councilmatic/issues/135, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgxbcjL38_4a5nQDYEZQ5A8mZ5AURk_ks5tgAXMgaJpZM4Swwen .

-- 773.888.2718

reginafcompton commented 6 years ago

Can you say more about the "better web experience"?

For example, how does this NYC bill (HTML) compare with this Chicago bill (PDF)?

PDF cons With the PDF, you need to scroll, if the bill has multiple pages; the PDF also looks rather small in mobile view.

HTML cons With the HTML, we lose detail in the original bill document, which can make it difficult to read in the mobile view (see example above).

fgregg commented 6 years ago

What details do we lose in the original bill document for NYC.

On Mon, Mar 19, 2018 at 10:19 PM, Regina Compton notifications@github.com wrote:

Can you say more about the "better web experience"?

For example, how does this NYC bill https://laws.council.nyc.gov/legislation/int-241-2018/ (HTML) compare with this Chicago bill https://chicago.councilmatic.org/legislation/o-2018-2260/ (PDF)?

PDF cons With the PDF, you need to scroll, if the bill has multiple pages; the PDF also looks rather small in mobile view.

HTML cons With the HTML, we lose detail in the original bill document, which can make it difficult to read in the mobile view (see example above).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/datamade/nyc-council-councilmatic/issues/135#issuecomment-374459958, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgxbYV3b-4xJR9uoKEMlRQZjBRqlfsnks5tgHVWgaJpZM4Swwen .

-- 773.888.2718

reginafcompton commented 6 years ago

Those are important points.

For details, we mainly lose header and footer information - so, nothing crucial. In that sense, it's an aesthetic issue. However, I still think for longer bills with several indents we sacrifice readability (particularly in the mobile view). I might be projecting my subjective experience though.

If we decide to maintain the rtf converter, then we must remember that it's imperfect: we should render PDFs as a "back-up" when a bill does not have html. But....would such inconsistently look strange for users?

fgregg commented 6 years ago

We could render the footer and header.

reginafcompton commented 5 years ago

I was able to speed up the RTF conversion script via https://github.com/datamade/django-councilmatic/pull/230 (per issue #155).

We should still consider scraping the PDF links, but this seems like an enhancement to the current system. I will mark it as such.