Add support for Oracle: implementation and tests

Ste1io commented 1 year ago

PetaPoco has "unofficially" supported Oracle for quite some time now... I'd like to implement the integration tests for Oracle so that can change finally, and we can claim official support without a big "but" after. This will also allow us to confidently investigate, address, and close multiple issues that are unclosed, and can most likely be readily fixed provided tests are possible.

Happy to spearhead this myself alongside the remaining test refactoring.

[x] Get a working Oracle image setup for docker (thanks @Curlack!): #702
[x] Add said image to docker-compose.yml for integration tests runner (thanks @Curlack!): #702
[x] Update connection strings in "PetaPoco.Tests.Integration\appsettings.json" and "PetaPoco.Tests.Integration\app.config" files (thanks @Curlack!): #702
[x] Create a build script at "PetaPoco.Tests.Integration\Scripts\OracleBuildDatabase.sql" and port all common scripts contained in the others that are supported by Oracle (thanks @Curlack!): #702
[x] Create a base OracleTestProvider class in "PetaPoco.Tests.Integration\Providers\OracleTestProvider.cs" (thanks @Curlack!): #702
[x] #706
[x] #703
[x] #707 Closes #255 Closes #691
[ ] Fix remaining failing integration tests

Addresses #626 Addresses #613

Curlack commented 11 months ago

Been a hellacious week with work; appreciate the patience. Will catch up with things here today.

No problem.

I've had some time on my hands and took these failing tests to town. I was able to make all of them pass, but had to dive deep into the heart of some core files. Changes are not too hectic though.

I decided to split the OracleTestProvider into 2 parts like what you did with the SqlServerSystemDataTestProvider and SqlServerMSDataTestProvider. So we now have (well almost finished with that)OracleOrdinaryTestProviderand OracleDelimitedTestProvider.

The gist of it is to ensure we cover the scenario where the user is having a case-sensitive database (delimited), which is also the way all the other databases work AND we cover the scenario where the user is having a case-insensitive database (ordinary). Technically the other databases also work like this except everything in Oracle is capitalized.

I'm hoping to give you a PR tonight still. I've gone outside the box with this one, so there's a chance that some changes are gonna raise eyebrows hahahaha. Hope at least it gives birth to some awesome ideas from thereon.

Ste1io commented 11 months ago

Sounds good, looking forward to seeing it.

I'm really thinking our best bet is to apply a global configuration option for Oracle for this, which is overridden on a per-property/column basis using an attribute/columninfo config. This would allow us to retain case sensitivity, and internally know whether or not to quote table names and sql identifiers. This would also allow all current functionality with dynamic types to work without forcing the typed poco to follow a convention.

Noted that there is already an Oracle-specific attribute at play, so implementing this would be very similar.

Curlack commented 11 months ago

@Ste1io I ran all the integration tests just as a sanity check, expecting all to pass now, but no dice.

I'm getting the following error on a couple of the MSAccess tests:

Message: 
System.InvalidOperationException : Command parameter[2] '' data value could not be converted for reasons other than sign mismatch or data overflow.

---- System.Data.OleDb.OleDbException : Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.

  Stack Trace: 
OleDbCommand.ExecuteCommandTextErrorHandling(OleDbHResult hr)
OleDbCommand.ExecuteCommandTextForSingleResult(tagDBPARAMS dbParams, Object& executeResult)
OleDbCommand.ExecuteCommandText(Object& executeResult)
OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method)
OleDbCommand.ExecuteNonQuery()
<>c.<ExecuteNonQueryHelper>b__315_0(IDbCommand c) line 3226
Database.CommandHelper(IDbCommand cmd, Func`2 executionFunction) line 3249
Database.ExecuteNonQueryHelper(IDbCommand cmd) line 3226
Database.ExecuteInsert(String tableName, String primaryKeyName, Boolean autoIncrement, Object poco) line 2156
Database.Insert(Object poco) line 1929
DeleteTests.Delete_GivenPoco_ShouldDeletePoco() line 63
----- Inner Stack Trace -----

I'm fairly sure it's not related to my changes and Google is of no help. It seems to be only the deletes and inserts causing this. Have you had this issue before?

Ste1io commented 11 months ago

Yes...not sure what causes it, but restarting the msaccess service fixes it.

Curlack commented 11 months ago

I don't think you mean "windows service", but here it goes... How do I do that?

Curlack commented 11 months ago

Btw, quickly trying to retype everything from my head. Wanted to undo a local commit and clicked "Reset (--hard)" by accident and like magic all my changes gone, ai! Still have some big parts of the code changes left to work with. Hold thumbs.

PS: I'm, still crying little bit.

Ste1io commented 11 months ago

Btw, quickly trying to retype everything from my head. Wanted to undo a local commit and clicked "Reset (--hard)" by accident and like magic all my changes gone, ai! Still have some big parts of the code changes left to work with. Hold thumbs.

PS: I'm, still crying little bit.

OK, don't freak out... :) any other git commands besides that one? and any additional flags with the reset command?

For reference:

git reset --hard will undo your last commit, including the changes in it (e.g., your working tree is reset to the state it was in prior to that commit) git reset --soft will undo the last commit, but leave those changes staged (e.g., your working tree is reset to the prior commit, with that commit staged; useful if you need to fix something in the previous commit, assuming it hasn't been pushed to remote yet).

You can possibly still recover the changes from the last commit that was reset using reflog. Is that what you're wanting to do at this point @Curlack?

Ste1io commented 11 months ago

Don't make any commits. If you have any changes on your working tree, stash them. Then run git reflog, and grab the commit for the missing commit. You might need to guess at this one, but assuming you haven't done anything since the reset, it's probably the last commit hash). Then make a new branch (like "recovered-branch") using that commit hash with git branch recovered-branch <hash> Checkout that branch, and let me know if your changes are there (git checkout recovered-branch).

Ste1io commented 11 months ago

I don't think you mean "windows service", but here it goes... How do I do that?

I was referring to restarting the msaccess db instance.

Curlack commented 11 months ago

Don't make any commits. If you have any changes on your working tree, stash them. Then run git reflog, and grab the commit for the missing commit. You might need to guess at this one, but assuming you haven't done anything since the reset, it's probably the last commit hash). Then make a new branch (like "recovered-branch") using that commit hash with git branch recovered-branch <hash> Checkout that branch, and let me know if your changes are there (git checkout recovered-branch).

Man, you know your stuff! Unfortunately I don't see any recovered changes, but also I lost changes I haven't committed yet (local only). Anyway almost done recalling from memory. Thanks bro!

Curlack commented 11 months ago

Ok so all tests now pass, see #709. Hope the branch issues I had on my local branch/fork didn't bleed into the PR.

Calling it a night, tired of doing these changes twice hahaha. Enjoy!

Curlack commented 11 months ago

Found weird issue when running tests, requiring custom provider and/or mapper in parallel with tests using the defaults. It seems the handling of the data, especially in my case (TitleCase vs UPPERCASE) bleeds into one another. On further investigation, turns out that it's not right for PocoData to use a global cache (_pocoDatas), but not take the mapper into consideration. Seems like cache key should be Type AND mapper for 'PocoData`.

I bet the same thing is true for PocoFactories cache.

It can be reproduced fairly easily by debugging two tests in parallel, as long as they both use the same type poco e.g. Note and different mappers. Whoever gets to ForType first will dictate the way queries get generated.

Something like this will do:

private class CustomConventionMapper : ConventionMapper
{
    public CustomConventionMapper() => MapColumn = Customize(MapColumn);

    private Func<ColumnInfo, Type, PropertyInfo, bool> Customize(Func<ColumnInfo, Type, PropertyInfo, bool> original)
    {
        return (ci, t, pi) =>
        {
            if (!original(ci, t, pi)) return false;

            ci.ColumnName = ci.ColumnName.ToUpperInvariant();
            return true;
        };
    }
}

The issue came up in the Fetch_ForDynamicTypeGivenSql_ShouldReturnValidDynamicTypeCollection test (one of many). The line giving the error was in QueryTests.AddOrders when inserting a new Person.

I can either run all "Ordinary" or "Delimited" tests in parallel, but not all together...Pop goes the weasel!

I think my changes are ready to be committed. Can I create the PR or would you like me to take a stab at fixing this issue beforehand?

PS: I still having those pesky commits attached to my tail.

Curlack commented 11 months ago

I was able to flatten the history (sync fork/fetch/pull/rebase), unfortunately I wasn't able to get rid of the commits since 22 Oct.

This is what my branch look like atm.

Tried checkout and cherry picking again, but was losing too many changes when I looks at the diffs, so just rebased to origin. Tried a couple of things, got scared and stopped. Kept me up till 3AM this morning. So I'm still picking up these commits.

Anything I should/can do to try and get rid of it (without losing changes)?

asherber commented 11 months ago

On the PocoData cache issue: https://github.com/CollaboratingPlatypus/PetaPoco/issues/517

asherber commented 11 months ago

On the git issue: Which branch in your repo, and what are you trying to do with it?

Curlack commented 11 months ago

My feature/oracle-support branch is apparently 13 commits behind when 10 of them have already been merged into the upstream branch of the same name. The original issue came in when I didn't sync my fork after those merges. I want to know if there is something I can do to have my PR only contain the changes not yet merged.

In the meantime I've thinking about making a backup of the code on my drive, recreating the fork and copying the code over the download (should only see the differences). Would that work?

EDIT: Would also get rid of the dud PR, lol

asherber commented 11 months ago

13 commits behind what? What branch do you want to merge into, and which 3 commits do you want to keep?

Curlack commented 11 months ago

13 commits behind what? What branch do you want to merge into, and which 3 commits do you want to keep?

See my screenshots above and this one below I mean ahead

When I click "Compare & pull request", I see the commits from the previous 2 PR @Stelio already merged into the upstream branch. OMG I'm screwing this up really bad, lol. Sorry I'm not yet fluent in git.

So plan to create PR on this branch that has this upstream branch

asherber commented 11 months ago

On git: The O'Reilly Git Pocket Guide is an excellent intro, even if it is 10 years old. And all of Git Pro is free online

Every git commit has a hash value (the 7 chars on the right of the GH screenshot above). Which three commits do you want to keep?

As for your comment about making a backup of the code on your drive, git calls that a clone, and it's the usual way of working with a repo. You forked PetaPoco into your account, now you clone it and work locally. All your dev work and commits are made locally, and then when you're ready they're pushed to your fork on GitHub. Then you open a pull request to get those changes merged to the main PetaPoco repo.

Curlack commented 11 months ago

Thanks so much. I'll study that in my spare time. I wanted to keep bbcfcb0, 4c84dea and b0649a0, but I realize I can't do that otherwise I'll lose changes again. So what I'll do is start a new fork, clone. Copy my files over to the new folder from the backup and then go through the changes and do proper commits before the PR. Thanks again for the assist.

asherber commented 11 months ago

Hold on, there might be an easier way.

asherber commented 11 months ago

An important thing to keep in mind is that a git commit is not (just) a changeset. Technically, it's defined by the hash value. When you do things like rebasing and force pushing, you're creating entirely new commits which happen to have the same content as other commits. Now when you try to merge, git sees new commits (hashes it doesn't know about), but it will also understand what changes need to be made to the source files.

For example, I see your commit 31bcf, "Cater for optional parameter prefix". Looks like you rebased or did something else which created 0dfc8 -- same content, different commit. Depending on what you're merging, git will try to preserve both commits, but all that does is pollute the commit history (which we try to avoid) -- if they've got the same content, you're actually not introducing any other changes.

Take a look at https://github.com/CollaboratingPlatypus/PetaPoco/compare/feature/oracle-support...asherber:foo Nevermind the list of commits -- does that represent the contents of what your want to merge?

Curlack commented 11 months ago

Eureka! That's exactly what I was after, unfortunately I saw your message too late. Started new fork already. I couldn't unclutter the commit history. Did you do that straight from the web UI?

asherber commented 11 months ago

The GitHub UI is not good for any kind of branch management. What I did locally was create a new branch from 719731 and then cherry pick bbcfc and b0649.

Here are git commands you could use locally to accomplish this, using your existing oracle-support branch. Note that this will throw away any commits on your branch earlier than 71973.


git checkout feature/oracle-support

git reset --hard 71973

git cherry-pick bbcfc

git cherry-pick b0649

git push --force

I do many git things from the command line, but a GUI tool is invaluable to see the branch relationships. My personal favorite is https://gitextensions.github.io/

Curlack commented 11 months ago

At long last success. All Oracle tests are passing (#710). @asherber I only needed the cache fix and a custom provider and mapper for the "Ordinary" identifiers case. As far as dynamic object casing is concerned, created overrides for the failing tests to use the correct casing coming from the poco factory, which the user would be aware of since he's the one that created the custom provider and mapper. This time touch points in the core was kept to a minimum and all the "Delimited" tests were able to pass with out of the box functionality without a sweat.

Viva PetaPoco! 😄 ✌️

Ste1io commented 11 months ago

Found weird issue when running tests, requiring custom provider and/or mapper in parallel with tests using the defaults. It seems the handling of the data, especially in my case (TitleCase vs UPPERCASE) bleeds into one another. On further investigation, turns out that it's not right for PocoData to use a global cache (_pocoDatas), but not take the mapper into consideration. Seems like cache key should be Type AND mapper for 'PocoData`.

@Curlack what's the status on this? If this is indeed a bug, could you file an issue for it specifically so it's not forgotten? That should be patched separately outside of the feature branch, though, as a bugfix. Still have a lot of catching up to do on here, so apologies in advance if I'm touching on something that's already been handled.

On the plus side I was able to spent most the morning pulling last week's thoughts and codes together so I have something coherent for you two to rip into. :)

PS: I still having those pesky commits attached to my tail.

Did you get your fork and branches squared away with git?

Curlack commented 11 months ago

@Ste1io I'll submit another PR for this once the core change has been finalized on the develop branch. Issue has already been logged (See #517). You can follow our discussion in PR #711.

I was able to get rid of my tail btw, thanks.

Ste1io commented 11 months ago

@Curlack, I've got an early day tomorrow and have to get off shortly, so I'll post the discussion I mentioned in my last message tomorrow. It details quite a few things that I encountered over the weekend while exploring some approaches to the quoting issue with runtime objects like anonymous types and dynamic objects.

Regarding my solution, however, here's a very abbreviated synopsis, so you and @asherber can mule over it between now and then.

Several months ago, I created a helper class, PocoDataHelper<T> designed to act as a bridge of sorts (I guess decorator would be the more correct term) between PocoData and the escape methods in IProvider. Goals:

Remove the boilerplate code currently necessary (creating a PocoData instance, followed by verbose calls to the provider, which wrap even more verbose linq queries for looking up the column name)
Accommodate more readable and expressive code in my personal projects
Promote more refactor-friendly and maintainable code, leveraging expressions to my advantage to eliminate the use of strings wherever possible
Provide intellisense support when referencing typed columns (which comes for free thanks to expressions)
Make it easier for the end user - aka myself - to do the right thing because it's easier than cutting corners or c/ping from another section of code
Do all this in a non-intrusive manner by applying the open/closed principle

Regarding my use of expressions, it's worth noting that my implementation does not incur the typical performance hit as is usually the case with compiled expressions: there is actually no compilation being done, I'm simply extracting the property name from the expression, and doing a lookup for it in the PocoData.Columns cache.

An optional IMapper can optionally be injected during construction with the same default that the core library uses already. I also am testing a version today that sets the PocoData.TableData.TableName property when an instance is constructed from the non-generic base PocoDataHelper class constructor, to further eliminate repeated code when used with runtime objects.

Escaping/quoting behavior is configured on a per-instance basis for flexibility, with the ability to override it as needed when referencing an identifier, such as in this test:

var pd = new PocoDataHelper<Order>(DB, escapeIdentifiers: false);
var sql = $"SELECT * FROM {pd.GetTableName(escapeIdentifier: true)} " +
          $"WHERE {pd.GetColumnName(c => c.Status, escapeIdentifier: true)} = @0";

My intentions were to finish full test coverage and do some perf profiling before bringing it to the table as something to consider adding, but as it currently offers the most feasible and safest solution to handling the intricacies with Oracle, I'm letting her enter the fray early. I've also been using it in two projects since originally writing her, with no problems.

I'm sure you have an idea the direction I'm suggesting by now, but just to be clear, a few considerations below. The current implementation I'm using for identifier escaping could be applied to the Oracle-specific configurations also, with sane defaults. Currently I'm leaning towards something along the lines of the following:

QuotingPolicy: Never (default), Always, WhenNecessary CasingPolicy: FoldToUpper (default), FoldToLower, PreserveCase

This could be set on a global level,

var db = DatabaseConfiguration.Build()
                              .UsingConnectionStringName("Oracle")
                              .UsingQuotingPolicy(QuotingPolicy.Never)     // Never (default), Always, WhenNecessary
                              .UsingCasingPolicy(CasingPolicy.FoldToUpper) // FoldToUpper (default), FoldToLower, PreserveCase
                              .Create();

which the Oracle-specific helper class would respect unless overridden for specific instances (per-table override) or call sites (per-column override).

Keep in mind, one reason there are currently so many overloads for the base CRUD functionality, is to pass through the needed data for that particular database operation, when dealing with a runtime type. An anonymous or dynamic type can't have attributes, obviously. So the primarykey gets passed in through the call to Fetch, Query, Update, and all the others.

Currently there's one Oracle-specific attribute: SequenceName, along with several others that haven't been brought up yet in these discussions, but do exist and will spring their ugly head the minute we think we got it. If we're not using a typed class and attributes, how do we pass it through?

Option 1: More overloads, add a parameter for it, handle it at the core level Option 2: Add it to our decorator class which extends PocoDataHelper, perform any logic necessary there, then allow the user's method call to work with our fixed-up final value no different than any other provider.

This also has the added benefit of being able to specify something, such as quoted/unquoted, in a flexible manner even inline, with no adverse effects. True, it can be done manually, though I think we all agree that argument has nothing going for it given how error-prone it is (besides locking you into that dbms without a major overhaul that touches everything everywhere), As a trivial example just to illustrate:

// Quoted or unquoted column name, with following signature:
db.SomeOperation<T>(string tableName, string primaryKeyColumnName);

// At callsite:
dynamic myFooTable = //...
var pdh = new PocoDataHelper(db, myFooTable, "Id"); // assume unquoted is default
db.SomeOperation<dynamic>(tableName: "FooTable", pdh.GetColumnName("BarColumn", true)); // quoted override for db column 'BarColumn'
db.SomeOperation<dynamic>(tableName: "FooTable", pdh.GetColumnName("BazColumn")); // unquoted default for db column 'BAZCOLUMN'

In closing:

Yes, I'm introducing a new design pattern, which I'm usually hesitant to do. However, it improves the usability of the entire codebase, not just Oracle, and provides an extensible and flexible tool to accommodate Oracle's edge cases without introducing potential regressions from invasive core library changes.
Implementing Oracle-specific behavior from the outside, without coupling it to core code that touches other providers, keeps ensures that none of the current providers are affected. This means no performance hit on other providers, and no change to the current API surface (out of necessity to deal with Oracle).
Quoting logic would be handled here in PocoDataHelper<T> (or a derived OraclePocoDataHelper<T>), with the escaping method in OracleDatabaseProvider being a non-transformative pass-through method.

This was far less "brief" than I meant when I started, but hopefully it gives everyone something to chew on, and I'm looking forward to your thoughts.

I updated all synchronous test methods for both PetaPoco.Tests.Integration/Databases/QueryTests.cs and SQLite's derived QueryTests, with all tests passing. Please take a few minutes to look over the diffs for those two linked files, as well as the class itself (under 85 lines), in the context of Oracle's identifier handling especially with anonymous and dynamic types.

Commit history including diffs from test cases here - click here.

Curlack commented 11 months ago

I like this a lot. Can't wait to see the full version.

Curlack commented 11 months ago

Would this PocoDataHelper also be able to make edge cases like #657 easier to deal with?

Ste1io commented 11 months ago

I've only briefly skinned over the thread, and that was a few months ago, so can't say for sure until I can review it a little more thoroughly; at this point you probably would know better than I.

The property-to-column map is just a simple string lookup that saves PetaPoco's cached column on first access; no need for more than that in this case really. As with the column cache tho, it really depends on if the performance gains from not making repeated lookups to the global cache justified what (relatively little) extra data storage it incurred, as well as synchronizing cache flushes in case there's any long-living instances being used. I didn't want to invest too much time into caching until I had a chance to profile it though; what I originally wrote it for, it does damn well, so until now the rest was just a bonus.

Since it is self contained and only represents a single table per instance, I guess it could potentially help from what little I read, possibly with a couple minor adjustments. Just keep in mind it's still using the global cache as the source of truth on the first column lookup, if that makes any difference.

CollaboratingPlatypus / PetaPoco

Add support for Oracle: implementation and tests #699