cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

Publish Core Signs and Markings data to the Open Data Portal #3114

Closed SurbhiBakshi closed 2 years ago

SurbhiBakshi commented 4 years ago

Published datasets:

SurbhiBakshi commented 4 years ago

Prep for Discovery Day

@johnclary - done with the installations

johnclary commented 4 years ago

Great. I suggest playing around with a few of the sodapy examples. See if you can upsert data, as described here: https://github.com/xmunoz/sodapy#upsertdataset_identifier-payload-content_typejson

And feel free to add something on my calendar for pair programming.

SurbhiBakshi commented 4 years ago

Created views in the API Private scene for SMB-

  1. Specification Counts Table - view_3304
  2. Reimbursable Trackings - view_3305
  3. work_orders_signs_markings_time_logs - view_3307
johnclary commented 3 years ago

@SurbhiBakshi send me the dataset configuration info when possible.

SurbhiBakshi commented 3 years ago

@johnclary - Here is the dataset configuration info. After our discussion at Apps Sync yesterday, I looked at the field names to see if there could be any potential issues. The first view seems fairly straightforward. The other two have some potential problems with field names from Knack not aligning with Socrata. I highlighted those in the spreadsheet for the second one. I think the third view might be the most complex.

johnclary commented 3 years ago

Thanks @SurbhiBakshi

I have completed the setup for Specification Counts Table. A few things to note that I failed to mention.

Field names

The Socrata API field names need to match the field name in the Knack object, not the view. Re-naming a field in the view itself has no effect on this process. This is a limitation of the Knack API, not my scripts ;)

And I'll repeat the caveat that the Knack field name does not have to be lower case, and spaces will be replaced with underscores. So, if your field in Knack is "Asset Name", your field in Socrata should be "asset_name". The field's "Display Name" in Socrata can be freely modified as you see fit.

ID Field and Row Identifiers

Each dataset needs to have a dedicated ID field which will be automatically populated with Knack's built-in record IDs.

The field's display name can be freely-defined, but the field name must be id and they field type must be text. This field must also be assigned as the dataset's row identifer to ensure that upserts are handled properly.

It can be a bit tedious to change row identifiers—because it requires that all records in the dataset have a unique value in the ID field before you set it to be a row identifier. The easiest work around is to download the data and upload just one row of a data that meets that requirement.

If you have a conflicting field in your Knack data named "ID", we should as general best practice rename it. So let me know if you run into that situation. As an FYI, if we absolutely must keep our "ID" field in Knack, this column name will be translated to _id when publishing to Socrata.

Connection fields

It's a best practice to not use a connection field in the API view, because the value in that field will be the "Display Field" value as it is configured for the object. If someone were to change the display field for that object, it could break the ETL.

Instead, when you're adding columns to the view, use the drop-down menu to pick the related object, then add the field from that object that you want in the view. See below screenshot as an example of this for the "ATD_WORK_ORDER_ID" field for the specifications table.

Screen Shot 2020-12-28 at 2 36 36 PM
johnclary commented 3 years ago

Just followed-up with Susanne:

Question for you: two of these tables have a "Created By" column, which I imagine is helpful for your reporting, but we don't like to publish staff names in public datasets. One alternative would be to assign each user in the system an ID #, and we could publish that ID number instead. Would that work for you?

johnclary commented 3 years ago

@SurbhiBakshi r.e. the Time Logs dataset. As you suggested, would you split this into two separate Knack views, one for signs and one for markings? On the Socrata side, we can either merge these two datasets or publish to separate ones. If we merge them, we'll want to touch up the Knack field names that are common to both signs and markings so that they match. E.g. rename COUNT_TIME_LOG (signs) to COUNT_TIME_LOGS (as in markings).

amenity commented 3 years ago

It's most important that this data goes to MAINT — Socrata is a nice-to-have.

@johnclary - discussed this at today's meeting with SMB... Susanne would be using the time logs data on a daily basis in her GIS work, so the most important aspect of this would be getting it to MAINT. Potentially via AGOL, but it would be ideal to be able to use Attribute Rules (ArcGIS Pro feature).

cc/ @jaime-mckeown :)

jaime-mckeown commented 3 years ago

@amenity currently data does not go from AGOL to Maint, only from Maint > AGOL. Are the time logs data that Susanne is working with in Data Tracker?

johnclary commented 3 years ago

OK, we'll need to talk about this a bit more. As Jaime mentioned, the data has not and will not live on MAINT in the near future. It'd would be good to understand the attribute rule's need as well, since the AGOL layers are meant to be read-only.

SurbhiBakshi commented 3 years ago

@jaime-mckeown, @johnclary - So it sounds like Susanne could potentially pull in an AGOL read only dataset to join and then populate. She needs the date field from the time logs. I am unclear which dataset on MAINT will be populated - I thought she had the time logs data already on MAINT and wanted it updated. But it could be a work order dataset. @jaime-mckeown the time logs data is in the Signs and Markings Data Tracker.

jaime-mckeown commented 3 years ago

Susanne has related tables to the short line, specialty line, and specialty point layers all located in MAINT. Each table has a WORK_DATE field.

amenity commented 3 years ago

She just needs the last maintenance date (will be tracked in Knack once we've figured out #4732) from AGOL.

amenity commented 3 years ago

@SurbhiBakshi - do you want to put this in the backlog and mark as a future sprint?

SurbhiBakshi commented 3 years ago

Going back to the original intent of this issue (which got a little side tracked due to other issues) there are three tables that need to be connected from Knack to the Open Data Portal so Susanne can connect to those from Power BI.

  1. Specifications Counts
    • The dataset for this table has been created on the Open Data Portal.
    • John has completed the setup for this table to be updated on the Open Data Portal.
    • Will need to create a Created By ID so names are not populated on the Open Data Portal (referring to John's email to Susanne mentioned above). Will this list live in Knack (Accounts?) and how will the ID be created automatically when new accounts are created (will this be an incremental ID field in the Accounts object?)

2. Signs Markings Time Logs

  1. Reimbursement Trackings
    • The dataset for this table has been created on the Open Data Portal.
    • Need to check if separate tables need to be created for Signs and Markings. Similar to Signs Markings Time Logs, these can either be combined in Power BI or before uploading to the Open Data Portal.
amenity commented 3 years ago

Christina and Susanne are going to discuss this and get back to us. Also consider if/how any of these needs are met by the high-level dashboard. https://github.com/cityofaustin/atd-data-tech/issues/4732

SurbhiBakshi commented 3 years ago
  1. I have added the fields that Christina and Susanne needed to the datasets for the High Level Dashboard for Signs Markings Time Logs and will be meeting with Chia and Diana to talk about automating the ETL.
SurbhiBakshi commented 3 years ago

Update :

  1. Specifications Counts

    • The dataset for this table has been created on the Open Data Portal.
    • John has completed the setup for this table to be updated on the Open Data Portal.
    • Will need to create a Created By ID so names are not populated on the Open Data Portal (referring to John's email to Susanne mentioned above). Will this list live in Knack (Accounts?) and how will the ID be created automatically when new accounts are created (will this be an incremental ID field in the Accounts object?) Don't think an ETL is in place to update the dataset.
  2. Signs Markings Time Logs

    -The dataset for this table has been created on the Open Data Portal. I have added the fields that Christina and Susanne needed to the datasets for the High Level Dashboard for Signs Markings Time Logs and will be meeting with Chia and Diana to talk about automating the ETL. (#6636)

  3. Reimbursement Trackings

    -The dataset for this table has been created on the Open Data Portal. Don't think an ETL is in place to update the dataset. This table may need to be simplified.

dianamartin commented 3 years ago

Emailed stakeholders 8/3

Hi Ladies, Chia’s finished writing the automation script for the SMB datasets that will go to the Open Data portal. I’d like both of y’all to review the dataset before I switch it to the public view. Datasets to Review

  • Work Order Markings
  • Work Order Markings Jobs
  • Work Order Signs Markings Time Logs
  • Work Order Signs Let me know when you have finished your review and I’ll take the next step to publish them.
johnclary commented 3 years ago

I noticed the datsets did not have an id column set as the row identifier, which is what ensures the Knack > Socrata upserts process correctly. I fixed that asked some more questions:

Susanne's response:

johnclary commented 3 years ago

I am creating the reimbursements dataset. Note to self—we are likely going to be able to remove some of the calculated fields that Surbhi created for the high level dashboard—I forgot that this was going to be one of the main benefits of moving to PowerBI. Will have to identify those fields once we get the all-clear from SMB.

johnclary commented 3 years ago

Sent reimbursements dataset for review. We will need to complete #6888 in order to deploy the ETL.

Susanne—I created the reimbursements dataset, here. Let me know how that looks. I wasn't quite sure if these are all the fields you need. We'll clean up the field names and add metadata before we make these public.

johnclary commented 3 years ago

Also—we should tackle #6637 while we're doing this. So to recap todos:

johnclary commented 3 years ago

Did some minor tweaking of the time logs dataset. Waiting for additional feedback before finishing tasks in my comment above.

johnclary commented 3 years ago

All datasets have been published publicly:

johnclary commented 3 years ago

The reimbursements DAG has been deployed! Data is live. We have a few minor follow-up tasks listed above.

johnclary commented 3 years ago

Susanne provided additional metadata changes for the datasets. I made the changes and set the reimbursement dataset to be private as she requested.

johnclary commented 3 years ago

@johnclary you need to exclude disassociated reimbursements

johnclary commented 3 years ago

I excluded disassociated reimbursements and cleaned up the calculation fields we no longer need. this is done!