NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

519 qaqc shell script #526

Closed td928 closed 2 years ago

td928 commented 2 years ago

519

Overview

To make the process more iterative and simpler for testing the development of qaqc app. You probably notice that not all the qaqc steps are included in the new 03_qaqc.sh e.g. qaqc_mid.sql. My thought process is that we really talking two quite different parts of qaqc one is the report generated for the housing research team to do manual research and the other part which we take to generate the application for our team. So I think it is okay to only include specific qaqc steps we are using for the application and If it could be explicit I might change the name to 03_qaqc_de.sh.

impact

some upstream and downstream impact are also needed namely in the yml and also devdb.sh to incorporate call for the new shell script.

Oysters1874 commented 2 years ago

Great. That makes more sense now. It looks good to me.

SashaWeinstein commented 2 years ago

no emoji 😞

td928 commented 2 years ago

no emoji 😞

major misstep by me. To make it up with some flare 🐉 one reviewer is fine.

SashaWeinstein commented 2 years ago

Nice emoji! Ok so I took at look at the other qaqc scripts we could move into the new bash script.

So my conclusion is that all these scripts can be run after build is completed. To be totally extra sure we should run the qaqc before, make the change, and then run the qaqc again but I wanted to share my thought process to get confirmation before spending the time

td928 commented 2 years ago

My assessment is the same as you @SashaWeinstein. I agree with your test to make sure the "in between" steps did not impact the qaqc as well. How are you comparing the results though? I guess you can substract two mid_qaqc from each other to see if everything comes out zero as the dataframe is entirely binary. Is this what you are thinking as well?

td928 commented 2 years ago

I guess pandas how this nice function comparing df. Going to test this now.

SashaWeinstein commented 2 years ago

Awesome! I had not thought of a good way to test, was honestly probably just going to eyeball but doing it programmatically is definitely better. I use psycopg2 to pull tables down from postgres and load them into pandas if you're looking for a nice way

td928 commented 2 years ago

a conversation with @SashaWeinstein set me right and run tests should be all good. I attached the jupyter notebook which I run the test with if anyone want to replicate the tests on their side. Thanks all! qaqc_compare.ipynb.zip