Publish Core Signs and Markings data to the Open Data Portal

SurbhiBakshi commented 4 years ago

Published datasets:

SurbhiBakshi commented 4 years ago

Prep for Discovery Day

[x] Install Homebrew
[x] Install Python 3
[x] Install MiniConda
[x] Install Visual Studio
[x] Install VS Python extension
[x] Install SodaPy
[x] Set up dataset in the Open Data Portal

@johnclary - done with the installations

johnclary commented 4 years ago

Great. I suggest playing around with a few of the sodapy examples. See if you can upsert data, as described here: https://github.com/xmunoz/sodapy#upsertdataset_identifier-payload-content_typejson

And feel free to add something on my calendar for pair programming.

SurbhiBakshi commented 4 years ago

Created views in the API Private scene for SMB-

Specification Counts Table - view_3304
Reimbursable Trackings - view_3305
work_orders_signs_markings_time_logs - view_3307

johnclary commented 3 years ago

@SurbhiBakshi send me the dataset configuration info when possible.

SurbhiBakshi commented 3 years ago

@johnclary - Here is the dataset configuration info. After our discussion at Apps Sync yesterday, I looked at the field names to see if there could be any potential issues. The first view seems fairly straightforward. The other two have some potential problems with field names from Knack not aligning with Socrata. I highlighted those in the spreadsheet for the second one. I think the third view might be the most complex.

johnclary commented 3 years ago

Thanks @SurbhiBakshi

I have completed the setup for Specification Counts Table. A few things to note that I failed to mention.

Field names

The Socrata API field names need to match the field name in the Knack object, not the view. Re-naming a field in the view itself has no effect on this process. This is a limitation of the Knack API, not my scripts ;)

And I'll repeat the caveat that the Knack field name does not have to be lower case, and spaces will be replaced with underscores. So, if your field in Knack is "Asset Name", your field in Socrata should be "asset_name". The field's "Display Name" in Socrata can be freely modified as you see fit.

ID Field and Row Identifiers

Each dataset needs to have a dedicated ID field which will be automatically populated with Knack's built-in record IDs.

The field's display name can be freely-defined, but the field name must be id and they field type must be text. This field must also be assigned as the dataset's row identifer to ensure that upserts are handled properly.

It can be a bit tedious to change row identifiers—because it requires that all records in the dataset have a unique value in the ID field before you set it to be a row identifier. The easiest work around is to download the data and upload just one row of a data that meets that requirement.

If you have a conflicting field in your Knack data named "ID", we should as general best practice rename it. So let me know if you run into that situation. As an FYI, if we absolutely must keep our "ID" field in Knack, this column name will be translated to _id when publishing to Socrata.

Connection fields

It's a best practice to not use a connection field in the API view, because the value in that field will be the "Display Field" value as it is configured for the object. If someone were to change the display field for that object, it could break the ETL.

Instead, when you're adding columns to the view, use the drop-down menu to pick the related object, then add the field from that object that you want in the view. See below screenshot as an example of this for the "ATD_WORK_ORDER_ID" field for the specifications table.

johnclary commented 3 years ago

Just followed-up with Susanne:

Question for you: two of these tables have a "Created By" column, which I imagine is helpful for your reporting, but we don't like to publish staff names in public datasets. One alternative would be to assign each user in the system an ID #, and we could publish that ID number instead. Would that work for you?

johnclary commented 3 years ago

@SurbhiBakshi r.e. the Time Logs dataset. As you suggested, would you split this into two separate Knack views, one for signs and one for markings? On the Socrata side, we can either merge these two datasets or publish to separate ones. If we merge them, we'll want to touch up the Knack field names that are common to both signs and markings so that they match. E.g. rename COUNT_TIME_LOG (signs) to COUNT_TIME_LOGS (as in markings).

amenity commented 3 years ago

It's most important that this data goes to MAINT — Socrata is a nice-to-have.

@johnclary - discussed this at today's meeting with SMB... Susanne would be using the time logs data on a daily basis in her GIS work, so the most important aspect of this would be getting it to MAINT. Potentially via AGOL, but it would be ideal to be able to use Attribute Rules (ArcGIS Pro feature).

cc/ @jaime-mckeown :)

jaime-mckeown commented 3 years ago

@amenity currently data does not go from AGOL to Maint, only from Maint > AGOL. Are the time logs data that Susanne is working with in Data Tracker?

johnclary commented 3 years ago

OK, we'll need to talk about this a bit more. As Jaime mentioned, the data has not and will not live on MAINT in the near future. It'd would be good to understand the attribute rule's need as well, since the AGOL layers are meant to be read-only.

SurbhiBakshi commented 3 years ago

@jaime-mckeown, @johnclary - So it sounds like Susanne could potentially pull in an AGOL read only dataset to join and then populate. She needs the date field from the time logs. I am unclear which dataset on MAINT will be populated - I thought she had the time logs data already on MAINT and wanted it updated. But it could be a work order dataset. @jaime-mckeown the time logs data is in the Signs and Markings Data Tracker.

jaime-mckeown commented 3 years ago

Susanne has related tables to the short line, specialty line, and specialty point layers all located in MAINT. Each table has a WORK_DATE field.

amenity commented 3 years ago

She just needs the last maintenance date (will be tracked in Knack once we've figured out #4732) from AGOL.

amenity commented 3 years ago

@SurbhiBakshi - do you want to put this in the backlog and mark as a future sprint?

SurbhiBakshi commented 3 years ago

Going back to the original intent of this issue (which got a little side tracked due to other issues) there are three tables that need to be connected from Knack to the Open Data Portal so Susanne can connect to those from Power BI.

Specifications Counts
- The dataset for this table has been created on the Open Data Portal.
- John has completed the setup for this table to be updated on the Open Data Portal.
- Will need to create a Created By ID so names are not populated on the Open Data Portal (referring to John's email to Susanne mentioned above). Will this list live in Knack (Accounts?) and how will the ID be created automatically when new accounts are created (will this be an incremental ID field in the Accounts object?)

2. Signs Markings Time Logs

The dataset for this table has been created on the Open Data Portal.
Need to create separate tables for Signs and Markings time logs. These can either be combined in Power BI or before uploading to the Open Data Portal.

Reimbursement Trackings
- The dataset for this table has been created on the Open Data Portal.
- Need to check if separate tables need to be created for Signs and Markings. Similar to Signs Markings Time Logs, these can either be combined in Power BI or before uploading to the Open Data Portal.

amenity commented 3 years ago

Christina and Susanne are going to discuss this and get back to us. Also consider if/how any of these needs are met by the high-level dashboard. https://github.com/cityofaustin/atd-data-tech/issues/4732

SurbhiBakshi commented 3 years ago

I have added the fields that Christina and Susanne needed to the datasets for the High Level Dashboard for Signs Markings Time Logs and will be meeting with Chia and Diana to talk about automating the ETL.

SurbhiBakshi commented 3 years ago

Update :

Specifications Counts
- The dataset for this table has been created on the Open Data Portal.
- John has completed the setup for this table to be updated on the Open Data Portal.
- Will need to create a Created By ID so names are not populated on the Open Data Portal (referring to John's email to Susanne mentioned above). Will this list live in Knack (Accounts?) and how will the ID be created automatically when new accounts are created (will this be an incremental ID field in the Accounts object?) Don't think an ETL is in place to update the dataset.
Signs Markings Time Logs

-The dataset for this table has been created on the Open Data Portal. I have added the fields that Christina and Susanne needed to the datasets for the High Level Dashboard for Signs Markings Time Logs and will be meeting with Chia and Diana to talk about automating the ETL. (#6636)
Reimbursement Trackings

-The dataset for this table has been created on the Open Data Portal. Don't think an ETL is in place to update the dataset. This table may need to be simplified.

dianamartin commented 3 years ago

[x] Chia finished writing the automated script for the data updates for the SMB datasets to the Open Data portal #6676
[ ] Need to have stakeholders review the dataset before publishing from "private" to "public"
[ ] Make datasets "Public" on 8/3, see if automation script works for the following day

Emailed stakeholders 8/3

Hi Ladies, Chia’s finished writing the automation script for the SMB datasets that will go to the Open Data portal. I’d like both of y’all to review the dataset before I switch it to the public view. Datasets to Review

Work Order Markings

Work Order Markings Jobs

Work Order Signs Markings Time Logs

Work Order Signs Let me know when you have finished your review and I’ll take the next step to publish them.

johnclary commented 3 years ago

I noticed the datsets did not have an id column set as the row identifier, which is what ensures the Knack > Socrata upserts process correctly. I fixed that asked some more questions:

Should we create the reimnurseables dataset for you?
Is the Total Labor Cost field on the time logs dataset even useful?
Do we need to add additional fields to the work orders/jobs datasets which are required for the high level dashboard?

Susanne's response:

Yes, the reimbursable dataset does need to be added to Socrata. Currently the dataset is exported as a CSV, but it hasn’t been updated since May. I’m not sure on whether that should be public or private though.
The Total Labor Cost field on the time logs dataset doesn’t need to be used if it’s already provided in the reimbursables tracking as you said.
The additional fields do not need to be added, since they are already calculated through PowerBI

johnclary commented 3 years ago

I am creating the reimbursements dataset. Note to self—we are likely going to be able to remove some of the calculated fields that Surbhi created for the high level dashboard—I forgot that this was going to be one of the main benefits of moving to PowerBI. Will have to identify those fields once we get the all-clear from SMB.

johnclary commented 3 years ago

Sent reimbursements dataset for review. We will need to complete #6888 in order to deploy the ETL.

Susanne—I created the reimbursements dataset, here. Let me know how that looks. I wasn't quite sure if these are all the fields you need. We'll clean up the field names and add metadata before we make these public.

johnclary commented 3 years ago

Also—we should tackle #6637 while we're doing this. So to recap todos:

[x] get feedback on reimbursements table
[x] create atd-knack-services config - PR #82
[x] create airflow dag deployments for reimbursements table PR #63
[x] #6637
[x] #6888
[x] #6889
[x] Clean up Socrata field labels and add dataset metadata
[x] Publish Socrata datatsets publicly
[x] Remove unused Knack calculation fields that will be replaced by PowerBI
[x] Remove unused API views [Done]

johnclary commented 3 years ago

Did some minor tweaking of the time logs dataset. Waiting for additional feedback before finishing tasks in my comment above.

johnclary commented 3 years ago

All datasets have been published publicly:

johnclary commented 3 years ago

The reimbursements DAG has been deployed! Data is live. We have a few minor follow-up tasks listed above.

johnclary commented 3 years ago

Susanne provided additional metadata changes for the datasets. I made the changes and set the reimbursement dataset to be private as she requested.

johnclary commented 3 years ago

@johnclary you need to exclude disassociated reimbursements

johnclary commented 3 years ago

I excluded disassociated reimbursements and cleaned up the calculation fields we no longer need. this is done!

cityofaustin / atd-data-tech