citusdata / django-multitenant

Python/Django support for distributed multi-tenant databases like Postgres+Citus
MIT License
730 stars 118 forks source link

Add PostGIS support #150

Closed davidjayb closed 1 year ago

davidjayb commented 1 year ago

In a similar vein to https://github.com/citusdata/django-multitenant/pull/68 I've added PostGIS support through a database backend and some additional models that reference GeoDjango models. GIS specific tests have been added and excluded from basic test runs as it requires the PostGIS extension to be installed.

davidjayb commented 1 year ago

@microsoft-github-policy-service agree company="MyHEAT Inc."

gurkanindibay commented 1 year ago

Hi @davidjayb Thanks for your valuable contribution We recently added Codecov support to see our coverage. Could you rebase and get these developments as well to see the coverage effect of your contribution? Thanks

davidjayb commented 1 year ago

@gurkanindibay I've pulled in the changes but it doesn't look like the new workflows have been triggered. I will close and open up a new PR.

davidjayb commented 1 year ago

I've reopened this PR. Not sure how to trigger the codecov workflows.

davidjayb commented 1 year ago

@gurkanindibay I've added a pull request trigger to the workflow to get the tests to run. Running the workflow requires your approval: https://github.com/citusdata/django-multitenant/actions/runs/4235453586

gurkanindibay commented 1 year ago

@davidjayb Splendid effort. I like your work. Thanks for the contribution 👍. Actually, with your proposed solution we are getting two backends ; one for distributed usage and one for PostGIS usage, which should not be alternative to each other, in my opinion. Is there a way to use both of them with just a the same usage for both distributed and PostGIS? If you think it is not possible, still I want to work on unifying the solution. I can work on after prioritizing our tasks.

Thanks

gurkanindibay commented 1 year ago

BTW, tests are failing as well. Could you check them as well?

davidjayb commented 1 year ago

@davidjayb Splendid effort. I like your work. Thanks for the contribution 👍. Actually, with your proposed solution we are getting two backends ; one for distributed usage and one for PostGIS usage, which should not be alternative to each other, in my opinion. Is there a way to use both of them with just a the same usage for both distributed and PostGIS? If you think it is not possible, still I want to work on unifying the solution. I can work on after prioritizing our tasks.

Thanks

@gurkanindibay I believe we need to have two separate backends. PostGIS is an extension that requires additional OS packages installed, like GDAL, that is not suitable for a Postgres database that does not require GIS related features.

My suggestion is that we keep the database backends separate. If the Citus team is willing to publish docker images with Postgis installed then we can use those for testing.

I'll work on refactoring with your feedback and fixing the test failures. Please let me know what you think.

codecov[bot] commented 1 year ago

Codecov Report

Merging #150 (042923d) into main (68270b1) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #150   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           43        45    +2     
  Lines         1180      1258   +78     
=========================================
+ Hits          1180      1258   +78     
Flag Coverage Δ
missing-modules 35.25% <22.78%> (?)
postgis 99.44% <100.00%> (?)
postgres 95.44% <36.70%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
django_multitenant/tests/base.py 100.00% <100.00%> (ø)
django_multitenant/tests/gis_models.py 100.00% <100.00%> (ø)
...tenant/tests/migrations/0023_auto_20200412_0603.py 100.00% <100.00%> (ø)
...t/tests/migrations/0027_many_to_many_distribute.py 100.00% <100.00%> (ø)
...tests/migrations/0031_add_organization_location.py 100.00% <100.00%> (ø)
django_multitenant/tests/settings.py 100.00% <100.00%> (ø)
django_multitenant/tests/test_models.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

gurkanindibay commented 1 year ago

@davidjayb Thanks for all your efforts Nearly all tests have passed . There are some static code analysis failures And there are some lines needs to be covered. Code coverage rate drops dramatically with this PR. Could you check the untested lines as well and add tests for them?

gurkanindibay commented 1 year ago

@davidjayb Splendid effort. I like your work. Thanks for the contribution 👍. Actually, with your proposed solution we are getting two backends ; one for distributed usage and one for PostGIS usage, which should not be alternative to each other, in my opinion. Is there a way to use both of them with just a the same usage for both distributed and PostGIS? If you think it is not possible, still I want to work on unifying the solution. I can work on after prioritizing our tasks. Thanks

@gurkanindibay I believe we need to have two separate backends. PostGIS is an extension that requires additional OS packages installed, like GDAL, that is not suitable for a Postgres database that does not require GIS related features.

My suggestion is that we keep the database backends separate. If the Citus team is willing to publish docker images with Postgis installed then we can use those for testing.

I'll work on refactoring with your feedback and fixing the test failures. Please let me know what you think.

@davidjayb Splendid effort. I like your work. Thanks for the contribution 👍. Actually, with your proposed solution we are getting two backends ; one for distributed usage and one for PostGIS usage, which should not be alternative to each other, in my opinion. Is there a way to use both of them with just a the same usage for both distributed and PostGIS? If you think it is not possible, still I want to work on unifying the solution. I can work on after prioritizing our tasks. Thanks

@gurkanindibay I believe we need to have two separate backends. PostGIS is an extension that requires additional OS packages installed, like GDAL, that is not suitable for a Postgres database that does not require GIS related features.

My suggestion is that we keep the database backends separate. If the Citus team is willing to publish docker images with Postgis installed then we can use those for testing.

I'll work on refactoring with your feedback and fixing the test failures. Please let me know what you think.

@davidjayb we can first add the support then we can merge them under a single backend in another PR.In my opinion, viving the users the chance of benefit from both PostGIS and Citus is a good option.

davidjayb commented 1 year ago

@davidjayb Thanks for all your efforts Nearly all tests have passed . There are some static code analysis failures And there are some lines needs to be covered. Code coverage rate drops dramatically with this PR. Could you check the untested lines as well and add tests for them?

@gurkanindibay I've added a step to run the GeoDjango tests after the Citus tests. This should cover the code I've added here.

gurkanindibay commented 1 year ago

@davidjayb Thanks for all your efforts Nearly all tests have passed . There are some static code analysis failures And there are some lines needs to be covered. Code coverage rate drops dramatically with this PR. Could you check the untested lines as well and add tests for them?

@gurkanindibay I've added a step to run the GeoDjango tests after the Citus tests. This should cover the code I've added here.

@davidjayb Thanks for all your efforts Nearly all tests have passed . There are some static code analysis failures And there are some lines needs to be covered. Code coverage rate drops dramatically with this PR. Could you check the untested lines as well and add tests for them?

@gurkanindibay I've added a step to run the GeoDjango tests after the Citus tests. This should cover the code I've added here.

@davidjayb approved the CI execution and still have failures. Could you check these failures?

davidjayb commented 1 year ago

@davidjayb Thanks for all your efforts Nearly all tests have passed . There are some static code analysis failures And there are some lines needs to be covered. Code coverage rate drops dramatically with this PR. Could you check the untested lines as well and add tests for them?

@gurkanindibay I've added a step to run the GeoDjango tests after the Citus tests. This should cover the code I've added here.

@davidjayb Thanks for all your efforts Nearly all tests have passed . There are some static code analysis failures And there are some lines needs to be covered. Code coverage rate drops dramatically with this PR. Could you check the untested lines as well and add tests for them?

@gurkanindibay I've added a step to run the GeoDjango tests after the Citus tests. This should cover the code I've added here.

@davidjayb approved the CI execution and still have failures. Could you check these failures?

@gurkanindibay sorry I just realized that my fork runs the GHA and I can check there before you approve the workflow in this PR. I missed a closing quote 🤦 ...

The latest issue seems to be that the django-base-image job is not running. This is required to add the GDAL library to the base image:

Here is the GDAL error: https://github.com/davidjayb/django-multitenant/actions/runs/4316951694/jobs/7533400109#step:4:393

My fork cannot push images to Docker: https://github.com/davidjayb/django-multitenant/actions/runs/4309872506/jobs/7517739669

Here is the line added to the gha image: https://github.com/citusdata/django-multitenant/pull/150/files#diff-9b57afcb8b6128e7c6d80d6ed079cb7a11506b97a35ca9fbd0204ae619f205f0R21

I've built the base_gha_image/Dockerfile locally and run the tests so it should work in the pipeline if we get the new docker image built and pushed to Dockerhub.

Thank you for your reviewing efforts.

gurkanindibay commented 1 year ago

@davidjayb added the binary you requested and pushed into the dockerhub. Re-triggered the pipeline execution. There are some unit test errors. Could you check it again?

davidjayb commented 1 year ago

@gurkanindibay alright hopefully this does it :-) I found out that Citus throws a DataError when a non-null constraint is violated whereas the base postgres/postgis database will throw an IntegrityError.

gurkanindibay commented 1 year ago

@gurkanindibay alright hopefully this does it :-) I found out that Citus throws a DataError when a non-null constraint is violated whereas the base postgres/postgis database will throw an IntegrityError.

Still errors exist could you check the tests again?

davidjayb commented 1 year ago

@gurkanindibay It looks like the test is failing because of a recent data migration.

https://github.com/citusdata/django-multitenant/actions/runs/4349335712/jobs/7599301328#step:4:101

Accounts are being created as part of migration 25:

https://github.com/citusdata/django-multitenant/blob/main/django_multitenant/tests/migrations/0025_data_load.py#L25

The test fixtures try to create accounts with static public keys 1-3:

https://github.com/citusdata/django-multitenant/blob/main/django_multitenant/tests/base.py#L40

I'm not sure why the tests are not failing for Citus but I assume this is because it generates public keys differently from base postgres / postgis.

I've tried removing the static primary keys but there are failures when creating the fixtures in the UtilsTest... Really weird:

https://github.com/davidjayb/django-multitenant/actions/runs/4356291592/jobs/7614093721#step:3:816

Edit: I've isolated the issue, the tests in UtilsTest will fail when creating the fixtures after a model with a UUID is saved. After a model with a UUID for an ID is created Django will start generating UUIDs for all models, even those that have their ID field set as AutoField or BigAutoField.

The test_uuid_create and test_uuid_save create a Record and Organization, both of which use UUIDs: https://github.com/citusdata/django-multitenant/blob/main/django_multitenant/tests/test_models.py#L702

This also happens when running the tests against the Citus database locally. Tested against citusdata/django-multitenant:main as well as my fork so this isn't something specific to this PR. Not sure what the root cause is.

As a stopgap solution to this case I've updated the Account fixtures to generate a random PK: https://github.com/citusdata/django-multitenant/pull/150/files#diff-83d193511af65ba935ccf6afc03e09e174fa30a180cd5f8764e6608a7b8bb0f3

With this change it looks like the build is finally passing: https://github.com/davidjayb/django-multitenant/actions/runs/4357421331

davidjayb commented 1 year ago

@gurkanindibay I've been able to get all of these passing in my fork:

https://github.com/davidjayb/django-multitenant/actions/runs/4357421331

gurkanindibay commented 1 year ago

@gurkanindibay I've been able to get all of these passing in my fork:

https://github.com/davidjayb/django-multitenant/actions/runs/4357421331

thanks @davidjayb . All tests are passing in this PR as well. However, coverage is being dropped dramatically. Could you check the coverage reports in codecov https://github.com/citusdata/django-multitenant/pull/150/checks?check_run_id=11841895054

davidjayb commented 1 year ago

@gurkanindibay this seems to be due to the code coverage only being used by the "Test GeoDjango MT" stage. There are several lines that are skipped over by the GIS tests as Citus is not being used.

Is there a way to merge the code coverage reports from the two test stages?

I've added an optional test run parameter to append the code coverage run: https://github.com/pytest-dev/pytest-cov#coverage-data-file

https://github.com/citusdata/django-multitenant/pull/150/commits/6c1f6ffcd61c2c4a2c8756c3a325c413e5479889

gurkanindibay commented 1 year ago

@davidjayb thanks for all your efforts. Now I need to perform a detailed review.

davidjayb commented 1 year ago

@davidjayb Could you use our docker-compose file in the PR #164

Do you want me to unify everything so that the GIS and Citus tests all run together?

Edit: actually, that wouldn't be possible as the database backend needs to change. I've updated the PR to use Citus for the GIS tests.

davidjayb commented 1 year ago

@gurkanindibay can you run the workflow again? The codecov failed because I forgot to remove a condition I introduced when running on a vanilla Postgis database (i.e. non-Citus).

gurkanindibay commented 1 year ago

I was thinking the same as you. It is difficult for someone who wants to use both GIS and Citus together I will talk to our product manager tomorrow and in the meantime I will think about a solution. If you have any solution to work them together please commit it Thanks for all your efforts

davidjayb commented 1 year ago

@gurkanindibay in my mind having two separate backends is appropriate as the GIS backends are distinct in Django. Having the django-multitenant backends leverage mixins allow for the functionality to be common between the two.

gurkanindibay commented 1 year ago

@gurkanindibay in my mind having two separate backends is appropriate as the GIS backends are distinct in Django. Having the django-multitenant backends leverage mixins allow for the functionality to be common between the two.

It makes sense. However, I'm trying to figure out if there is a need for using both as in your tests. You use both organization and OrganizationLocation. In that case which backend should be used?

gurkanindibay commented 1 year ago

I will make one more round of review tomorrow

gurkanindibay commented 1 year ago

Still not figuring out the customer use case using GIS and Normal backend together. I see that PostGIS is seperate for Django as well. However, I need a meaningful use case testing the scenarios

davidjayb commented 1 year ago

Still not figuring out the customer use case using GIS and Normal backend together. I see that PostGIS is seperate for Django as well. However, I need a meaningful use case testing the scenarios

These aren't being used together. The GIS tests override the database backend to use django_multitenant.backends.postgis and add the django.contrib.gis.

The OrganizationLocation model should only be used when the PostGIS is used and the GIS app is installed with Django. It uses a GeoDjango specific field that is not available in a base Django installation. There is interoperability between base-Django and GeoDjango fields.

I believe keeping these separate, and having two separate test runs, is useful to make sure both the base-Django and GeoDjango work. Without the separation then base-Django would not be tested.

gurkanindibay commented 1 year ago

@davidjayb I'm working on post-release issues and the blogpost about the new release. I will continue on reviewing after these tasks

gurkanindibay commented 1 year ago

@davidjayb sorry for the delay of the review. I'm trying to handle post release tasks and prioritising bugs. I will return this task ASAP

davidjayb commented 1 year ago

@gurkanindibay no problem. I'll wait to update the PR until the review is completed.

davidjayb commented 1 year ago

@gurkanindibay thank you for the review. Unfortunately I don't have the time to address the feedback now. I'll try to get back to this when I can.

davidjayb commented 1 year ago

@gurkanindibay I am working on updating this PR a bit. I've been able to get the current tests running again after pulling in the most recent changes:

https://github.com/davidjayb/django-multitenant/actions/runs/4598836271

I've added flags for all tests so that you can determine the code coverage for each test run. The postgis tests are covering quite a bit at 99%. Aggregating all coverage together brings it to 100%:

https://app.codecov.io/gh/citusdata/django-multitenant/pull/150/flags

Edit: For some reason empty lines in the test settings.py are being picked up as "untested" in codecov...

In any case, the current coverage tests all django-multitenant features. The GIS model that I added is to ensure that GIS-specific queries can be utilized. The following test cases you outlined are being run by the postgis backend without creating separate GIS specific models:

davidjayb commented 1 year ago

@gurkanindibay hey there, what is the status of the PR review? All checks have passed and there seems to be adequate coverage in the GIS tests.

gurkanindibay commented 1 year ago

@gurkanindibay hey there, what is the status of the PR review? All checks have passed and there seems to be adequate coverage in the GIS tests.

Hi @davidjayb I'm very busy with my other projects. This one is on my task list. I will reserve time ASAP

Thanks for your valuable contributions and your patience

davidjayb commented 1 year ago

This PR has been sitting stale for quite some time so I have decided to close it. It doesn't seem like the team wants to take the time to get this merged in so hopefully you can take the time to add first-class support for PostGIS in the future.

gurkanindibay commented 1 year ago

Hey @davidjayb Sorry for the long delay. I have other assignments and I couldn't deal with this PR until now. You have put a great effort and PR is nearly OK to merge I have an available time slot this month. If you're available, we can add this PR and create a major release with this feature. Open points in my mind

  1. Can we use the PostGIS backend with models extended from TenantModel? Projects using django-multitenant with PostGIS backend may have addtional classes which does not include GIS attributes. What will be the behaviour in this case?
  2. Now we have a detailed documentation https://django-multitenant.readthedocs.io/en/latest/. We need a special section here that defines project usage with PostGIS. After handling item 1, we need a clear documentation for users to easily use the GIS feature

I hope you can reserve time for this PR. Thanks for all of your contributions @davidjayb 🥇