GROUP BY error in PostgreSQL ODM2 timeseries DB

sreeder commented 7 years ago

I am getting a Group by error when connecting to a Non-LBR ODM2 postgres database.

@emiliom @miguelcleon @lsetiawan

miguelcleon commented 7 years ago

@sreeder actually I'm getting the same error for both of these The only one I got working so far is a call like http://127.0.0.1:8080/odm2timeseries/rest/1_1/GetSites?site=odm2timeseries:Rio%20Icacos%20Trib-IO

miguelcleon commented 7 years ago

Here is a commit I added to my fork https://github.com/miguelcleon/WOFpy/commit/f53476b7edf5b335b808e5cbc2f45b7a9ef0beab

emiliom commented 7 years ago

Just wondering the status of this issue. @miguelcleon, are you saying the commit you added on your fork resolved the GROUP BY problems for you?

I'm somewhat confused b/c I remember seeing many other exchanges on this issue last week; I assume they've been deleted? Some were referring to changes Miguel thought he'd made, but turned out to be already in WOFpy.

@sreeder, I assume you haven't had a chance to look further into this. And @lsetiawan doesn't have access to the EnviroDIY PostgreSQL database Stephanie is testing (AFAIK), so he wouldn't have been able to look into this.

Thanks, all.

miguelcleon commented 7 years ago

@emiliom The commit I added in my fork only resolved the group by error for the GetSites?site= type call. It isn't terribly useful. Yeah, I deleted out some of the comments because they didn't seem helpful.

emiliom commented 7 years ago

The commit I added in my fork only resolved the group by error for the GetSites?site= type call. It isn't terribly useful.

Well, that's still progress! Thanks. Are you saying that a GetSites request w/o the site= argument still returns a GROUP BY error?

miguelcleon commented 7 years ago

Yes, the same error Stephanie posted here.

sreeder commented 7 years ago

@emiliom @lsetiawan Would it be enough to give you access to the server that we are trying to host wofpy on? If not I can work on creating a copy of the live database onto that server that you guys could have access too. The issue with giving you access to the live database is that there are multiple firewalls we don't control, it is easier for us to make a copy and move into a demilitarized zone.

lsetiawan commented 7 years ago

@sreeder if that server has the connection to the database, and we get the credential to the database, then that would work. Thanks.

emiliom commented 7 years ago

Wouldn't it be easier (for Don and I) if we could deploy the postgresql database on our local machines, for testing? We would not expose anything to the internet, of course. It would just be to minimize hassles and lags during testing and bug hunting.

@sreeder, that would just mean you sending us a postgresql dump file that we can then restore into a working database.

sreeder commented 7 years ago

@emiliom that is a great idea. I have create the postgresql dump file and added it to the ODM2 Google drive folder, under ODM2_db_scripts. Let me know if you have any troubles finding it.

emiliom commented 7 years ago

Thanks, @sreeder! I was able to download it. @lsetiawan and I will let you know if we have any issues restoring it into Postgresql.

lsetiawan commented 7 years ago

@sreeder What postgresql and postgis versions are you using? Thanks

lsetiawan commented 7 years ago

Also, I am currently having ROLE issues. Please dump without any role/owner associations Thanks.

pg_restore: [archiver (db)] Error from TOC entry 427; 1255 37207 FUNCTION reset_sequence(text, text, text) admin_1
pg_restore: [archiver (db)] could not execute query: ERROR:  role "admin_1" does not exist
    Command was: ALTER FUNCTION public.reset_sequence(tablename text, columnname text, sequence_name text) OWNER TO admin_1;

pg_restore: [archiver (db)] Error from TOC entry 404; 1259 37184 TABLE django_migrations websdl_user
pg_restore: [archiver (db)] could not execute query: ERROR:  role "websdl_user" does not exist
    Command was: ALTER TABLE django_migrations OWNER TO websdl_user;

sreeder commented 7 years ago

@lsetiawan we are using Postrgresql version 9.5.6. I don't believe he actually has postgis installed. and I have updated the script on google drive without the role information.

emiliom commented 7 years ago

Thanks, @sreeder! FYI, @lsetiawan will work on this tomorrow; he usually works on a different project on Thursdays.

lsetiawan commented 7 years ago

@sreeder There doesn't appear to be an odm2 schema in your dump, or am I missing something. Thanks.

emiliom commented 7 years ago

You're right, @lsetiawan, there's supposed to be an odm2 schema. You don't necessarily show the entire content of the Schemas object in your screenshot, but after some checking with PgAdmin, I would expect schemas to be presented in alphabetical order, so it should be before pg_catalog (BTW, is this a screenshot from PgAdmin or DBeaver? I suspect it's the latter. But I assume you're used to seeing ODM2 postgresql DB's in DBeaver, from the LCZO and CZIMEA DB's).

On the other hand, I'd be surprised that @sreeder hasn't run into problems if she doesn't actually have an odm2 schema ...

sreeder commented 7 years ago

@emiliom @lsetiawan so sorry about that guys, I didn't realize he sent me two files, and of course I uploaded our Django one, instead of the ODM2. I have replaced the incorrect file on google drive with the correct odm2 schema file

lsetiawan commented 7 years ago

@sreeder No problem. Thanks for replacement! All works now. I have an odm2 database with odm2 schema with their appropriate tables! 😸

Now off to testing!

emiliom commented 7 years ago

Awesome! Thanks, @lsetiawan and @sreeder!

miguelcleon commented 7 years ago

@lsetiawan I was running some unrelated SQL queries against a PostgreSQL database and I happened to include an order by clause where one was inappropriate. I got a similar GROUP BY error. Seems like this could happen in general when the SQL is malformed.

Seems like it would be a good idea to run the SQL commands directly against PostgreSQL and tweak them, then try and translate the changes back into SQLAlchemy.

Maybe this helps?

emiliom commented 7 years ago

@lsetiawan I was running some unrelated SQL queries against a PostgreSQL database and I happened to include an order by clause where one was inappropriate. I got a similar GROUP BY error. Seems like this could happen in general when the SQL is malformed.

Yeah, but the weird thing is that the SQLAlchemy code does work with MySQL and SQLite! You'd think GROUP BY errors would be basic SQL issues that would work and fail the same across RDBMS's; but you never know.

Seems like it would be a good idea to run the SQL commands directly against PostgreSQL and tweak them, then try and translate the changes back into SQLAlchemy.

I think @lsetiawan will first test the offending SQLAlchemy code in isolation, outside wofpy. If he's still having problems, I'll work with him to translate that code into equivalent SQL (I'm a more experienced SQL coder than Don is :stuck_out_tongue_winking_eye: )

miguelcleon commented 7 years ago

Yeah, but the weird thing is that the SQLAlchemy code does work with MySQL and SQLite! You'd think GROUP BY errors would be basic SQL issues that would work and fail the same across RDBMS's; but you never know.

Yes, I would think so, I guess PostgreSQL tends to be a bit less forgiving then other RDBMS's :persevere:

lsetiawan commented 7 years ago

I think @lsetiawan will first test the offending SQLAlchemy code in isolation, outside wofpy.

This is exactly what I'm doing now 😄

emiliom commented 7 years ago

I guess PostgreSQL tends to be a bit less forgiving then other RDBMS's

Good point.

I think @lsetiawan will first test the offending SQLAlchemy code in isolation, outside wofpy.

This is exactly what I'm doing now

Great!

lsetiawan commented 7 years ago

Looking at the get_all_sites in the DAO, this is where the problem lies for GetSites query.

I modified the query a bit to make it work and here's what I found:

First Finding

Seems like Postgresql require you to list all the Tables involved in GROUPBY to work properly

Since Sites and TimeSeriesResults are polymorphic tables, I only put odm2_models.Sites rather than odm2_models.Sites.SamplingFeatureID and odm2_models.SamplingFeatures.SamplingFeatureID; same case with TimeSeries Results

sites1 = db_session.query(odm2_models.Sites, odm2_models.TimeSeriesResults). \
join(odm2_models.FeatureActions). \
filter(odm2_models.FeatureActions.SamplingFeatureID == odm2_models.Sites.SamplingFeatureID,
       odm2_models.TimeSeriesResults.FeatureActionID == odm2_models.FeatureActions.FeatureActionID). \
group_by(odm2_models.Sites, odm2_models.TimeSeriesResults).all()

Result:

>>> len(sites1)
>>> 249

Seems like it just queried everything, group_by didn't work since it's grouping by multiple tables.

Second Finding

Modified to only query sites, moved TimeSeriesResults as part of the JOIN

sites2 = db_session.query(odm2_models.Sites). \
join(odm2_models.FeatureActions). \
join(odm2_models.TimeSeriesResults). \
filter(odm2_models.FeatureActions.SamplingFeatureID == odm2_models.Sites.SamplingFeatureID,
       odm2_models.TimeSeriesResults.FeatureActionID == odm2_models.FeatureActions.FeatureActionID). \
group_by(odm2_models.Sites).all()

Result:

>>> len(sites2)
>>> 42

This looks more promising, the query actually grouped by Sites, but now TimeSeriesResults isn't part of the resulting query. But then going back to https://github.com/ODM2/WOFpy/issues/116

Why is that queried?
Do we really need it?

emiliom commented 7 years ago

Nice work. Let's look at it together on Monday. I'll be able to compose equivalent SQL queries then, to compare.

What happens when you run the second SQLAlchemy query type ("Second Finding") against the MySQL LBR DB? Does it work, and do you get the same number of sites as in the original query?

I'm guessing that if you run the first SQLAlchemy query type ("First Finding") against that DB, you'll get a much larger number of sites (due to duplication).

emiliom commented 7 years ago

My guess is that your second query is the way to go. And that's the query type we had investigated a couple of weeks ago, though not with postgresql. If it works with MySQL and SQLite, we'll just have to make some small tweaks to the code in get_all_sites that handles the response, I think.

sreeder commented 7 years ago

@emiliom @lsetiawan . Hey, I am not understanding why we are trying to get the TimeSeriesResults back when we are searching for all of the sites. and why are we doing a group by and not just trying to get all of the DISTINCT sites? Also, it would probably be easiest to query the SamplingFeature object and filter by type of "Site".

emiliom commented 7 years ago

@sreeder, we'll explain better on Monday. Right now we weren't trying to explain it to a wider audience :wink:

Very brief answers:

why we are trying to get the TimeSeriesResults back when we are searching for all of the sites

Because that's how the code was set up before we started working with it, partly b/c it assumes that the ODM2 database could have different result types and this DAO is intended to only return sites that have timeseries results.

and why are we doing a group by and not just trying to get all of the DISTINCT sites

We haven't explored DISTINCT in this context, but it's probably fine and may have better performance.

sreeder commented 7 years ago

@emiliom @lsetiawan are you using the API read_service calls or generating your own? Have you tried

read.getSamplingFeatures(type = "Site", results = True)

That should get you all of the Sampling features of type site, where results exist. From the sample database I sent you there are 42 SamplingFeatures returned..

lsetiawan commented 7 years ago

@emiliom

What happens when you run the second SQLAlchemy query type ("Second Finding") against the MySQL LBR DB? Does it work, and do you get the same number of sites as in the original query?

I get same number of sites in the original query.

I'm guessing that if you run the first SQLAlchemy query type ("First Finding") against that DB, you'll get a much larger number of sites (due to duplication).

Indeed. This is correct.

emiliom commented 7 years ago

@emiliom @lsetiawan are you using the API read_service calls or generating your own?

Not at this time. See the comment I just added here, in response to another comment of yours.

Have you tried read.getSamplingFeatures(type = "Site", results = True)

That should get you all of the Sampling features of type site, where results exist. From the sample database I sent you there are 42 SamplingFeatures returned..

Besides not using odm2api querying at this time, that would not work generically b/c, as I mentioned earlier, this DAO (odm2 timeseries) is targeted at TimeSeries result types only. It would work in a database that has no other result types (such as EnviroDIY?), but we want it to work generically, including in @miguelcleon 's Luquillo CZO postgresql database which stores different result types.

BTW, DAO's can be modified by users. So, you may find that you can optimize WOFpy (for performance and maintenance) with your ODM2 database in the future, by editing the built-in odm2 timeseries DAO. We can talk about this once we have wofpy working with your postgresql DB, and specially after we've issued a new release and conda package (hopefully next week).

lsetiawan commented 7 years ago

@emiliom In that case, should I use the "Second Finding" way of querying? getting rid of querying timeseriesresults and just put it as part of the join? Thanks.

emiliom commented 7 years ago

In that case, should I use the "Second Finding" way of querying? getting rid of querying timeseriesresults and just put it as part of the join?

Yup. But remember you'll probably need to edit the block at odm2_timeseries_dao.py#L51 too.

lsetiawan commented 7 years ago

@emiliom Roger that! Thanks. 👍

ODM2 / WOFpy

GROUP BY error in PostgreSQL ODM2 timeseries DB #102

First Finding

Second Finding