Closed ttavenner closed 10 years ago
:thumbsup: This is coming
On Mon, Mar 3, 2014 at 5:46 PM, ttavenner notifications@github.com wrote:
Each violation consists of four parts:
- the code violated,
- whether it was a core or priority item,
- a description of the violation
- a corrective action to be taken.
These are being correctly parsed into separate components, however there is no standard as to what index each elements ends up in. i.e. it could be [0] code [1] [2] core [3] description [4] action
or
[0] [1] code [2] core [3] description [4] action
etc. any combination is possible. It would be easier to parse if we could standardize/label the indexes. Looking at the HTML, this could be done by identifying elements.
core/priority is always in a red font tag with bold tags, the corrective action is always in a green font tag with italics, and the description sits between these with no markup. This could also prevent the table header from being included as its own violation, which is currently happening.
Reply to this email directly or view it on GitHubhttps://github.com/c4hrva/open-health-inspection-scraper/issues/6 .
Each violation consists of four parts:
These are being correctly parsed into separate components, however there is no standard as to what index each elements ends up in. i.e. it could be [0] code [1]
[2] core
[3] description
[4] action
or
[0]
[1] code
[2] core
[3] description
[4] action
etc. any combination is possible. It would be easier to parse if we could standardize/label the indexes. Looking at the HTML, this could be done by identifying elements.
core/priority is always in a red font tag with bold tags, the corrective action is always in a green font tag with italics, and the description sits between these with no markup. This could also prevent the table header from being included as its own violation, which is currently happening.