18F / federal-grant-reporting

Improving the experience of federal grant reporting.
Other
1 stars 7 forks source link

Develop understanding of source data #186

Closed danielnaab closed 4 years ago

danielnaab commented 4 years ago

User story

As a new developer, in order to make informed decisions on how to continue development on the Distiller, I would like to understand how data is being scraped from the source system and the underlying data model.

Notes

This task is to review the Selenium scraper and the existing documentation, and ask questions to stakeholders.

Acceptance criteria

danielnaab commented 4 years ago

This repo has some good exploration of the raw FAC data via IPython notebooks:

https://github.com/irenatfh/fac

This repo scrapes data, and while it doesn't look useful to us, is worth looking over:

https://github.com/govwiki/SingleAuditRepo

bpdesigns commented 4 years ago

Thanks @danielnaab. Does this complete this issue? (we can chat about it in our meeting tomorrow if easier)

@cantsin the second link has somethings about downloading FAC data and renaming it that might be worth 👀

danielnaab commented 4 years ago

@bpdesigns I'd still like to walk through the Selenium code and poke around that a bit, to understand how the site is handling the view state.

bpdesigns commented 4 years ago

Holding off on this issue for now pending conversations about data access next week.

bpdesigns commented 4 years ago

@danielnaab wrote a crawler to understand the data