chicommons / maps

MIT License
5 stars 17 forks source link

Update spreadsheet import to understand renamed columns #21

Closed laredotornado closed 3 years ago

laredotornado commented 3 years ago

For this ticket, you will have to have the Python/Django app running locally (outside of Docker). See the "Python/Django App" section here. We have a scriptable way of importing data into the directory, using

cd web
. venv/bin/activate
python manage.py /path/to/file.csv > directory/fixtures/seed_data.yaml
python manage.py loaddata directory/fixtures/seed_data.yaml

An example of the old version of the "file.csv" can be found here . We need to support a new CSV version, an example is here in which some of the column headers are renamed. Below are the new names we need to support

(old column name) -> (new column name) zipcode -> postal code city -> city1 email -> Email-Address

jensforsgard commented 3 years ago

Got it.

jensforsgard commented 3 years ago

Just of few questions:

The new csv file has both a 'city1' and a 'city' columns. Aren't 'city' and 'citystzip', as in the old file, more descriptive names? Or 'city' and 'address line 2'?

The 'Include' column is missing in the new file, but it is used in 'parse_coop_csv.py' The column 'phone_public' has changed name to 'Phone public', and that one seems also to be in use.

Are the other columns in the csv-file not used anywhere else? 

Also seems a little odd that the script is ignoring the 'is Mail public' column.

laredotornado commented 3 years ago

The new csv file has both a 'city1' and a 'city' columns. Aren't 'city' and 'citystzip', as in the old file, more descriptive names? Or 'city' and 'address line 2'?

I agree with you that they are more descriptive. I'm not clear on the reasoning behind the change as Steve updated the columns. For now, let's just use the newer names but I will loop you in to a conversation with him about more intuitive column names.

The 'Include' column is missing in the new file, but it is used in 'parse_coop_csv.py' The column 'phone_public' has changed name to 'Phone public', and that one seems also to be in use.

The "include" column was renamed "check". I guess I forgot to mention that but this is a crucial column -- it determines what appears on the map. We definitely want it. Good catch about "phone_public" -- let's adjust to use the new name "Phone public."

Are the other columns in the csv-file not used anywhere else?

The intention is that they are used but the script has not yet accounted for them (mostly because the backing db doesn't support it). I will write up a separate ticket to incorporate those.

Also seems a little odd that the script is ignoring the 'is Mail public' column.

This is because the db doesn't have a place for this information. I will write up a ticket so we can incorporate this too.

SteveEdiger commented 3 years ago

Since we can calculate the city/st/zip and are already collecting the components, we can ignore import of this field (sorry, I forgot which one contains all the elements). This was an artifact from the original file we inherited.

Include is a mandatory import, and will be subject to change, depending on dicussions in tonight's tech meeting. Currently it includes some rather cryptic entries "yes,no,check,directory only,yes-postal,no-insurance"

We are currently not using these 'other columns', but as I am reworking data, I am trying to fill them in. So we should import them.

Is mail public and is phone public are markers to determine whether we can include them in the public-facing map/directory or whether they are for our own internal use in verification only.

SteveEdiger commented 3 years ago

Please note that we are involved in some fairly major discussions about what information we need to collect and this set will change (get appended and name changes to existing fields) rather radically in the future. The existing scenario is simply to allow updating of the dataset until we can develop the app.

SteveEdiger commented 3 years ago

I mistyped "We are currently not using these 'other columns', but as I am reworking data, I am trying to fill them in. So we should import them."

There are some exceptions. For instance, Dave has already dealt with geocoding on the back-end, so we don't need to keep GeoCode Quality or Lat/lon.

SteveEdiger commented 3 years ago

Dave and Jens,

Between the two meetings, we made good progress. Thanks for sticking around, Jens. I know we went pretty late, but we pretty much got it into shape. Here is the file: EntityInfo

In the Current_v20-12-11, you'll find:

We want to migrate the labels to more database style labels so please suggest them for each row in New Field Name

Proposed Action: yes = keep/add remove = remove

Notes requires additional values work We discussed these renames (and perhaps retyping), but you will also be renaming all of the fields If a field is new, it's identified here.

I propose that you document the changes in EntityInfo.

When you're finished with the documentation and changes, please add a tab in the form of Current_vYY-MM-DD with the New Field Name, table name and Field Description/field contents columns so that we are set up for the next set of changes.

Thanks, Steve

laredotornado commented 3 years ago

@SteveEdiger , are we calling the include column (whether a row is included on the map "Include" or "check"? I'm noticing on the latest spreadsheet, it seems to be called "Include" now.

e.g.

,name,address,postal code,city1,st,city,Country,type,Include,website,Link to Contact Data,Opening Hours (Url when available),Tags (seperate by comma),Short description English,Short description local language,Image Filename/Link,Image License,Image Credit,Email-Address,is Mail public,Telephone Number,Phone public,GeocodeQualityType,lon,lat,Source,Completed by,organic,fair-trade,regional,category-tag1,category-tag2,category-tag3
0,1871,"222 W. Merchandise Mart Plaza, Suite 1212",60654,Chicago,IL,"Chicago, IL 60654",USA,Coworking Space,yes,http://www.1871.com/,,,,,,,,,,,,,,-87.63612199,41.88802611,Steve Ediger,SE,,,,,,
1,1335 ASTOR CO-OP BUILDING,1335 N ASTOR ST,60610,Chicago,IL,"Chicago, IL 60610",USA,Housing Coop,yes,https://www.dkcondo.com/managed-associations/1335-astor-co-op/,,,,,,,,,,,312-943-7500,,,-87.6271603,41.9069583,SEWG,DC,,,,,,

cc: @jensforsgard

laredotornado commented 3 years ago

@jensforsgard , per an email Steve sent, please change recognition of the column originally named "check" to "Include". If it doesn't specify a lower-cased value of "yes", then we can assume it is no (we will add new tickets for parsing out a more comprehensive rule-set)

SteveEdiger commented 3 years ago

I've tried documenting processing rules for each field where a rule applies in Entity Data Definition

Please refer to it and get back to me with questions or comments