KaylaCrush / advocacy-maps

The good governance project (GGP) is a non-partisan democracy reform group.
https://mapletestimony.org
MIT License
0 stars 1 forks source link

handle newlines in activity tables #45

Closed KaylaCrush closed 1 year ago

KaylaCrush commented 1 year ago

FUCKING newlines man I tell you. bane of my existence. Some of these tables have newlines in the middle of the bill title or activity column, and I'm just gonna have to figure out how to handle it! ha ha ha fml

arcane regex query I was working on:

Client: (.*?)\\n\\n\\n\\n\\n\\n\\nTotal amount paid by client; lobbyist is unable to report compensation at activity level:\\xa0\$(.*?)\\r\\n\\t\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nHouse \/ SenateBill Number or Agency NameBill title or activityAgent positionAmountDirect business association(\\n\\n\\n(.*?)\\n\\n(.*?)\\n\\n(.*?)\\n\\n(.*?)\\n\\r\\n\s*(.*?)\\r\\n\s+\\n(.*?)\s)
KaylaCrush commented 1 year ago

Good bit of error checking I should be doing after I split at newlines and before I divide chunks is, is len(list) % len(columns) == 0?? That is, is this list actually able to be evenly divided into all the columns? Because if NOT, something has gone wrong.

KaylaCrush commented 1 year ago

Good bit of error checking I should be doing after I split at newlines and before I divide chunks is, is len(list) % len(columns) == 0?? That is, is this list actually able to be evenly divided into all the columns? Because if NOT, something has gone wrong.

I did this, moved the separate_date function to happen only after this check is failed. Still working on newlines >.<

KaylaCrush commented 1 year ago

done in beautiful soup refactor