CodeWithAloha / uipa

Helping submit, track, and share public records requests in Hawaii
http://uipa.org
MIT License
10 stars 6 forks source link

Seed the database #73

Open tyliec opened 4 months ago

tyliec commented 4 months ago

Objective

To implement support for seeding the database.

Context

Some of us have gotten the UIPA portal running, however it looks a little weird with zero data. For the purposes of development, it is quite helpful to have an initial seeding of data present to allow for all the normal operations of the portal locally.

This capability was present in the previous iteration of the UIPA portal in the reset.sh script.

Tasks

Success Criteria

The ability to seed the database is available, for both public bodies and requests.

Related Items

Parent Epic: https://github.com/CodeWithAloha/uipa/issues/50

Open Questions:

tyliec commented 4 months ago

@yenhtran had some progress with this - but basically we found out that a simple python manage.py loaddata ... or import_csv isn't going to cut it. There is something different about our current version of the portal that doesn't allow for the previous version of the data's format.

tyliec commented 4 months ago

Found some docs on this - https://github.com/CodeWithAloha/uipa/blob/d9e5322e0ed21b680f0e597997c20274f670220e/docs/importpublicbodies.rst, going to give it a shot this Wednesday.

russtoku commented 4 months ago

Isn't this:

Found some docs on this - https://github.com/CodeWithAloha/uipa/blob/d9e5322e0ed21b680f0e597997c20274f670220e/docs/importpublicbodies.rst, going to give it a shot this Wednesday.

the same as Froide's Docs on Importing Public Bodies?

Both suggest using the Google spreadsheet linked in the docs to create a CSV file and load it using python manage.py import_csv public_bodies.csv. The 13 fields in that spreadsheet and described in the docs could be populated from the public body seed data in the uipa_org/fixtures directory in the master branch as mentioned in the Related Items section in the first comment. They are JSON data files so it should be straight-forward to extract data from them.

I restored a backup of my notes on UIPA development from 2018 and have been trying to recreate a development version of the UIPA website by loading the seed data from the uipa_org/fixtures directory. While I got a server running with a SQLite 3 database, I was only able to load the flatpages, jurisdictions, and sites from the JSON fixture files. I could load the foilaw, public body, public body tag, and tagged public body data.

russtoku commented 4 months ago

I was able to:

yenhtran commented 4 months ago

Notes on where we left off:

However we are noticing only 28 got uploaded. We get the following error when we run the command: python3 manage.py loaddata publicbody.publicbody.json Screenshot 2024-03-06 at 10 04 02 PM

When comparing pk: 28 (seeded) and pk:29 (not seeded), we don't find any significant difference: Screenshot 2024-03-06 at 10 05 35 PM

Was reviewing this chunk of code: https://github.com/CodeWithAloha/froide/blob/5584e46107fce5ffee409a2172d07974d9e9103e/froide/publicbody/models.py#L360

cc: @tyliec

russtoku commented 4 months ago

One of the problems with loading the database from the JSON fixture files is the files were dumped from a database that has data in related tables.

So, I wrote a Python program to extract the data from the publicbody.publicbody.json fixture file and create a CSV file to load from the Admin site or using python manage.py import_csv. I also wrote a Python program to get the unique names for classifications to load them before loading public bodies. I didn't include tags because I didn't realize that they are in the Categories table.

I just got through loading these tables and dumping the data into JSON fixture files:

I'm using the names from the Public Body administration page in the Admin site to refer to these tables.

I was also looking in the old UIPA codebase and saw these CSV files in the data folder:

I think it might be a good idea to use 2017-11-21-Hawaii_UIPA_Public_Bodies_All.csv as it is probably the most recent in terms of data used for the go-live of UIPA.org.

I'm going to redo my data load to use this file.

russtoku commented 4 months ago

As a side note, the old UIPA.org development used a SQLite database. I'm going to assume that using PostgreSQL for development is currently the preferred method. Thus, I created a clear_db.sh script to "reset" the database so you can run python manage.py migrate --skip-checks to initialize the database. This should help calm any fears about breaking stuff.

yenhtran commented 4 months ago

Thank you @russtoku for the explain. Any chance you'd be able to push up your changes? We are still stuck...

russtoku commented 4 months ago

Shall I make a pull request to add a seed folder under the uipa/data folder in the https://github.com/CodeWithAloha/uipa repo?

Can you tell me what repo, branch, and commit you're using? The main branch of https://github.com/CodeWithAloha/uipa before March 6, 2024 was renamed to main-copy and a new main branch was created.

yenhtran commented 3 months ago

@russtoku - I think a pull request would be super helpful.

So the repo/branch/commit I'm using is sort of complicated but I had been working off the main branch before March 6 (commit: d9e5322e0ed21b680f0e597997c20274f670220e) and have not kept it updated since there are a lot of breaking changes that might make the current issue harder to investigate. So @tyliec and I both agreed that for now that I don't pull in the changes until we have something working and then I'll branch off the most updated branch and apply the solution.

russtoku commented 3 months ago

Great! You are working at the point that I was when I was able to load public bodies as mentioned above in my https://github.com/CodeWithAloha/uipa/issues/73#issuecomment-1984273904.

I will make a pull request (PR) against the main branch of https://github.com/CodeWithAloha/uipa so you grab the files from it without updating your working directory. It shouldn't matter if you get the files from the PR or from the main branch after the PR is merged (assuming that it will be).

yenhtran commented 3 months ago

Yay! Mahalo @russtoku !🤙

russtoku commented 3 months ago

In regards to @yenhtran 's comment:

  • Got SOME data seeded in the Public Bodies UI interface...

However we are noticing only 28 got uploaded. We get the following error when we run the command: python3 manage.py loaddata publicbody.publicbody.json

When comparing pk: 28 (seeded) and pk:29 (not seeded), we don't find any significant difference:

The difference is pk:29 has classification and classification-slug values while pk:28 doesn't. These values must exist before the publicbody.publicbody.json or a CSV file to upload public bodies can be loaded.

russtoku commented 3 months ago

See https://github.com/CodeWithAloha/uipa/pull/80 for a first pass at seeding the database for development.

yenhtran commented 3 months ago

In regards to @yenhtran 's comment:

  • Got SOME data seeded in the Public Bodies UI interface...

However we are noticing only 28 got uploaded. We get the following error when we run the command: python3 manage.py loaddata publicbody.publicbody.json When comparing pk: 28 (seeded) and pk:29 (not seeded), we don't find any significant difference:

The difference is pk:29 has classification and classification-slug values while pk:28 doesn't. These values must exist before the publicbody.publicbody.json or a CSV file to upload public bodies can be loaded.

I think when we were still debugging this, we did notice that up until pk:29 all the classification and classification-slug fields were empty... so we did try and set those fields on pk:29 to empty strings but ended up not getting a different error. But this makes sense to actually create these classification fields.