Closed forki closed 6 years ago
Not currently, there was talk of serializing the type signatures to a file location. I don't think any work started on it thou. But it is definitely something that is possible.
Possible yes, the core of it is pretty easy to implement but it has a larger impact across everything else and will have some dragons that need taming.
It is one of the 3 bigger ticket items that we would like to do should we ever get the time :)
@pezipink so this reads like you accept the challenge of dragon taming ;-)
jus kidding, it's not that important. I will find a way to #IF out.
This would indeed be extremely useful. Currently, the type provider seems to hit the database on every single file save, completely killing intellisense until it succeeds or fails (which is not too uncommon for some reason).
I've started to work on it today, there is something already, and it was surprisingly easy until now. The branch name is "serialize" in my fork. If you guys could check it out, it would be great! :)
There are currently inmemory caches for generated tables and columns. Some notes:
A few more things off the top of my head to think about with respect to offline-caching:
A main selling point of the SQLProvider is that it accesses the schema lazily. Thus, we'd only want to cache the types that are actually used.
Partially cached types would probably need to be extended during "online" development. Do we also check for deltas, or simply have the user delete the cache in order to receive updated schema?
How much control do we give to the user for caching? Should it be fully controlled by a static parameter, or automatically attempt to cache when "online", only using types when a connection can't be made?
Various threading and IDE issues
Versioning? Perhaps this isn't that important since the user can delete their cache, but we wouldn't want a new version of the SQLProvider to suddenly break their compile due to an old cache.
It would be nice to avoid any library dependencies since we have managed this far no have none except the provided types api!
Yes, the real world, scary! :/ Some answers I would propose:
The problem really IMHO is the threading, that already got me yesterday, and I will try to deal with it today. But in summary I propose this to be, at the first moment at least, an extra feature for those who would deal with a very big and stable schema, for whom the constant update of a schema that does not change is in itself a problem. And of course, that will be an initial version, that could be enhanced with the time. What do you think?
Anywho, thanks guys for the tips, and I will start to deal with this concernments. :D
Just a thought on the taking dependencies, we could try and internalise the dependancy using ilrepack as part of the build. However I have no idea how this behaves with type providers it would however be interesting to find out..
I think the whole universe is depending James Newton-King. We can load the dlls dynamically and call with reflection to pretend that we are still in our own universe, but all the other Nuget-packages load that to your server memory anyway.
Yes, I wasn't saying we should never introduce dependencies, if they are they needed and it fits then it fits ...
Just an idea, one way to do this could be:
ctx.Offline(@"c:\sqlprovider.cache").SaveNow.``Save failed: Not enough disk space``
That could be then extended with another save method that would save the context in a background thread every 3 minutes if the data connection is available.
Interesting idea, very much non-standard! Would be cool to try as an experiment.
Another maybe even more simple option for Newtonsoft would be using System.Data.Dataset as we currently can't live without System.Data anyway. Something like
///pseudo
let saveSchema (location:string) =
let ds = new DataSet()
tableLookup |> Seq.map(fun t -> ds.Tables.Add(t.Value.Name, t.Value.Schema)) |> ignore
columnLookup |> Seq.map(fun c -> ds.Tables.[0].Columns.Add(c.Value.[""].Name)) |> ignore
pkLookup |> Seq.map(fun pk ->
pk.Value |> List.map(fun key -> ds.Tables.[pk.Key].Columns.[key].Unique <- true )
) |> ignore
relationshipLookup |> Seq.map(fun c -> (* ... *) ) |> ignore
ds.WriteXmlSchema(location)
()
let loadSchema (location:string) =
let ds = new DataSet()
ds.ReadXmlSchema(location)
ds.Tables |> Seq.iter(fun t -> tableLookup.AddOrUpdate( (* ... *) ))
// ...
()
Edit: I was thinking that this could helping either unit-testing or local caching, as the dataset could be filled with data as well... But maybe it would just make things unnecessary complex: 1) to cache to work correctly, all the key-value-pair-information has to be saved, 2) even if some dummy-data-insertion-method would be exposed, there wouldn't be easy way to check if the data is live and correct, 3) I expect we don't need an in-memory-sqlprovider.
After our conversation yesterday at F# Exchange I am reviewing the code to figure out how we can add this feature. Please check what I have in mind and correct me if some of my assumptions are not correct.
We can use DataSet capabilities to save and load data schema as @Thorium suggested so there's no need to bring additional dependencies.
An addition static SqlDataProvider parameter SchemaCache can be introduced to point to file(s) with data schema(s), more about it in p.5.
If SchemaCache is not set or points to a non-existing file, SqlDataProvider will behave as today, resolving the schema from the actual database. This will make it easy to configure cache once and enable it only when needed by placing the cache file into the specified location.
There will be a new SqlDataProvider method SaveSchema (may also have LoadSchema for symmetry but I am not sure about scenario when it may be used).
Now what to save? I think saving the whole schema (actually schemas) can be an overkill. I can only imagine how many megabytes it will take to save all schemas for the Oracle database in our organization, and ours is far from biggest ones. Here what can be done:
It will be up to a developer to call SaveSchema. This call should probably be not a part of a code running in a production. Something that can be executed from a script perhaps.
Now when the schema(s) is saved, it will be retrieved whenever SchemaCache is set, so there will be no need to load it in client code explicitly using LoadSchema.
Example: SchemaCache = [{SchemaName = "main"; FilePath = @"C:\Temp\main.xml"}] For databases without support for multiple schemas (if any): [{SchemaName = ""; FilePath= @"C:\Temp\schema.xml"}]
NB! Having written these notes, I've found Owner parameter that can, depending on the database vendor, specify schema or database name. If this is sufficient to resolve schema name, we may not even need a record type SchemaCache and can simply specify the file path of the schema for the respective owner. This will make better reuse of existing configuration settings but leaves a question of how to save/load multiple schemas. IMHO it may be OK to restrict caching to a single schema, so SchemaCache will require Owner to be set for multi-schema databases, and if a project uses schemas from multiple owners it will need to define multiple context types. Something to discuss, but if we make such restriction, then SchemaCache parameter can be renamed to SchemaPath (string) and SaveSchema will have a single string parameter specifying the file path. The rest will be figured out from Owner and TableNames values.
What are your views saving the whole database schema vs. saving only cached concurrent dictionaries content? The cache dictionaries should have all the fields needed to build success, but they are currently provider specific, so they could be saved via either common utility function or a new interface member.
Saving concurrent dictionaries will be more efficient than using legacy DataSet (and who loves XML?). The only argument in favor of DataSet schema is that it's already there and doesn't need bringing new dependencies. And it should be ok to deal with provider-specific dictionaries. But what will be a serialization format? JSON? Bring JSON.NET dependency just because of this feature?
P.S. SQLProvier already includes System.Runtime.Serialization that contain DataContractSerializer. Can this be used instead of adding Newtonsoft.Json?
And if we split the implementation into two parts:
then if we agree about 1, I can start working on it using whatever serialization, and we can finalize schema format later. So if you have any input on the part 1 (e.g. support for saving multiple schemas per context, naming of new parameters and methods etc.), please let me know.
More on serializers with .NET Core support: https://github.com/dotnet/corefx/issues/6564
Please consider other serializers in .NET Core such as DataContractSerializer (DCS), DataContractJsonSerializer (DCJS) and XmlSerializer, which are and will continue to be supported in .NET Core. Of course, you can also consider 3rd party serializers built on .NET Core.
I think compile-time offline caching is not performace-critical, so probably any serialization will do. We can split the implementation into two parts.
Regardless of the serialization format, there are still two options:
Initially I would say that the first one sounds better, as some people are using huge Oracle schemas. I would prefer that we start with as simple solution as possible and solve more complex scenarios when needed.
I also think the 1st sounds better to begin with. I will do some experiments in a new branch and let you review results once I get something worthy.
@Thorium I forked the code, experimenting with it right now and I need some tips regarding how I can only save the part of the schema referenced by the current context, not the whole db schema. Here's what I did so far:
Defined SaveContextSchema method in SqlDesignTime.fs:
yield ProvidedMethod("SaveContextSchema",[ProvidedParameter("filePath", typeof<string>)],typeof<unit>, invokeCode = fun args -> <@@ ((%%args.[0] : obj) :?> ISqlDataContext).SaveContextSchema((%%args.[1] : string)) @@>) :> MemberInfo
Extended ISqlDataContext with a new method SaveContextSchema:
abstract SaveContextSchema : string -> unit
Added prototype of the implementation of SaveContextSchema in SqlRuntime.DataContext. The implementation is currently serializing tables using DataContractJsonSerializer, just to try that this works:
member __.SaveContextSchema(filePath) =
use con = provider.CreateConnection(connectionString)
providerCache
|> Seq.iter (fun prov ->
let prov = prov.Value
let tables = prov.GetTables(con,caseSensitivity)
tables
|> List.iter (fun table->
use ms = new System.IO.MemoryStream()
let ser = new System.Runtime.Serialization.Json.DataContractJsonSerializer(table.GetType())
ser.WriteObject(ms, table);
let json = ms.ToArray();
System.IO.File.AppendAllText(filePath, System.Text.Encoding.UTF8.GetString(json, 0, json.Length))
When I test the implementation I see that SaveContextSchema saves everything, all database tables. This is of course a problem because just a JSON with all Oracle tables (no columns) take about 0.5 MB in our case. It can easily become 100 times larger with all column information. But I can't figure out how I can obtain the schema for only referenced tables/columns. Any tips?
Agree the primary use case for the caching is NOT to serialize the whole db schema, and only the stuff that has been actually used so you can compile "offline" (as opposed to developing against "new" stuff offline). Whilst it could be useful sometimes - especially for small databases - the whole point of the type provider is to be able to access massive schemas without generating lots of anything in the process, be it code or data.
@pezipink So what is your take on this use case? Do you think it's acceptable to just cache everything (this looks to be a straightforward task) since we are talking about quite special case - if anyone want to have a schema offline they might as well afford generation of multimegabyte file? Or we should strive to find a way of caching schema just for the tables used but then I need some hints from you guys, can't figure out myself how I will obtain such schema.
I think you should save only content of the provider's concurrent dictionaries, not tables. Each provider has those, e.g. Ms SQL server:
let pkLookup = ConcurrentDictionary<string,string list>()
let tableLookup = ConcurrentDictionary<string,Table>()
let columnLookup = ConcurrentDictionary<string,ColumnLookup>()
let relationshipLookup = ConcurrentDictionary<string,Relationship list * Relationship list>()
Are these populated well enough? That when you run code-editor, the already used tables schemas should be in those. So save and load for those only, not for the whole schema.
Also I think you shouldn't run it on invokeCode
as that is never invoked on design time, right?
Rather add the code to AddMembersDelayed
or AddXmlDocDelayed
and return the save-result as a new property (if you use membersdelayed) or a string (if you use xmldocdelayed).
@Thorium But how do I get access to them from the common provider code since all lookup dictionaries are defined inside its respective provider? Should each provider expose lookup dictionaries?
I would vote for a common tooling method(s) that saves the data, and call it inside each provider implementation. There is probably not enough common API that they could be exposed nicely.
I think it is acceptable if that is what the user really wants, but it does go against the spirit of what I as initially trying to achieve, and bringing it closer to what SQLMetal does.
I agree, add it to the interface and have the providers serialize their dictionaries. You will need to be careful though because things like AddMemebersDelayed
get called a bunch of times on different threads and other crazy things, likely you will need some locking, and maybe to be clever about how you serialize so it isn't trying to re-serialize the whole thing all the time. I foresee some unforeseen adventures for you !
Also, it is great you are having a go at this piece of work. Thanks ! :)
Alright, so I will look into caching provider-specific concurrent dictionaries and will use AddMembersDelayed. I will probably ask more questions on the way.
Question regarding differences in lookup dictionaries for different providers.
Majority of the providers use the following set of dictionaries: let pkLookup = ConcurrentDictionary<string,string list>() let tableLookup = ConcurrentDictionary<string,Table>() let columnLookup = ConcurrentDictionary<string,ColumnLookup>() let relationshipLookup = ConcurrentDictionary<string,Relationship list * Relationship list>()
However there are some exceptions:
If these differences are not intentional, I can make all providers use the same set but then since we plan to expose new common functionality based on schema cache, should these collections be moved to a common type that combine them all? Then they can all share common schema cache implementation. /cc @Thorium @pezipink
One thing to consider is that you may have multiple database connections, e.g. both MySQL and Postgres inside one program, or two same type connections to different databases. So now that the containers are not global but inside a provider, the caches wont mess each other. As long as that works, the cache implementation can be common.
Of course container instances should be per provider/connection. I meant we could have a common type that will wrap schema caches but each provider/connection will continue create and manage instances of such common type.
Yes, that would be good. :-)
I have implemented caching of schema which I am currently testing in my branch (https://github.com/object/SQLProvider/tree/schema_cache), works with Oracle, will run some tests with MS SQL Server later.
This appeared to be quite a big change.
With all these changes I can now write the following code:
type sql = SqlDataProvider<
ConnectionString = "my connection string",
DatabaseVendor = Common.DatabaseProviderTypes.ORACLE,
ResolutionPath = ResolutionPath,
IndividualsAmount = 1000,
UseOptionTypes = true >
let dbContext = sql.GetDataContext()
let profiles =
query { for p in dbContext.MyDatabase.Profiles do
where (p.Media = "video")
select p.Description }
profiles |> Seq.toList
dbContext.SaveContextSchema(@"C:\TEMP\Schema.txt")
Then I can disconnect machine from the internet and replace the code with the following:
type sql = SqlDataProvider<
ConnectionString = "my connection string",
DatabaseVendor = Common.DatabaseProviderTypes.ORACLE,
ResolutionPath = ResolutionPath,
IndividualsAmount = 1000,
UseOptionTypes = true,
ContextSchemaPath = @"C:\TEMP\Schema.txt" >
let dbContext = sql.GetDataContext()
let profiles =
query { for p in dbContext.Granitt.Profiles do
where (p.Media = "video")
select p.Description }
profiles |> Seq.toList
And the code will compile.
Let me know what do you think about this approach. It would be great to test it on other databases than Oracle and MS SQL Server.
@Thorium @pezipink
It's up to us to define behavior, but this is how it is currently implemented.
Reasoning behind this behavior: it would be nice to avoid unnecessary edits of the SqlProvider constructor properties just to enable and disable offline mode. SqlProvider construction parameters can be set once, and offline mode can be turned on based on the presence of the specified file. So it will be completely up to the development team when and where the place scheme cache file to control offline mode.
I have MSAccess, SQLServer, MySql, Odbc, Postgres and SQLite so I can do some testing when I'm not so busy.
I plan to do SQL Server testing on Linux tonight from MacBook/mono. This should provide some variation of environments :-)
Procedures are quite commonly used, GetSprocs returns Sproc list
so that maybe could be added to support the sprocs, if the params do serialize correctly (IDbConnection -> QueryParameter list
functions). Individuals...no-one probably uses them anyways other than exploring the data at design-time.
If GetSprocs can be added as easily as tables and columns, I can extend SchemaCache to include them.
I thought that this would save the file on design time. Instead it saves the cache at runtime. Ok, that's fine, just needs one more step for the user to run the code... Is there all the information at cache on runtime?
I just tested with SQL Server on Linux and SQL Provider built on Mac. Worked fine.
"I thought that this would save the file on design time. Instead it saves the cache at runtime."
I guess I didn't fully grasp this scenario. Especially since I was testing it from an F# script where there is not much practical difference between design-time and runtime. I can certainly change it but can you please clarify step by step how schema cache for the current context will be saved at design-time? I.e. anything I add to ISqlDataContext will be runtime, right? Where should I define SaveContextSchema and how will developer trigger its execution so it will include schema for all entities referenced in the scope?
I've re-read one of your comments. Here is what you meant, right?
dbContext.ContextSchema(@"C:\TEMP\Schema.txt").SaveNow.Error
But can it be shorter?
dbContext.ContextSchema(@"C:\TEMP\Schema.txt").Save
Tested with MySQL, it works, but the problem is that ctx.SaveContextSchema("c:\\test2.txt")
has to be called as the last line in your program. Which is not easy if you are not going to execute the whole program (e.g. I have a web server, and each method is called only when action is done, so I should surf the whole UI through before saving).
If we could do it design time, the design time might have loaded already everything for the compilation? (I'm not sure but could be?) Something like this (psedo, not working code) to SqlDesignTime.fs (expecting that ContextSchemaPath would be always defined and we would check the IsOffline some other way):
let saveResponse = ProvidedTypeDefinition("SaveContextResponse",None, isErased=true)
saveResponse.AddMemberDelayed(fun () ->
lock mySaveLock (fun() ->
let result =
if String.IsNullOrEmpty contextSchemaPath then
try
if not (System.IO.File.Exists contextSchemaPath) then
System.IO.File.Create(contextSchemaPath).Dispose()
prov.GetSchemaCache().Save contextSchemaPath
"Saved " + contextSchemaPath + " at " + DateTime.Now.ToString("hh:mm:ss")
with
| e -> "Save failed: " + e.Message
else "ContextSchemaPath is not defined"
ProvidedMethod(result,[],typeof<unit>, invokeCode = empty) :> MemberInfo
)
)
let m = ProvidedMethod("SaveContextSchema",[], (saveResponse :> Type), invokeCode = fun args -> <@@ "" @@>)
m.AddXmlDocComputed(fun () ->
if String.IsNullOrEmpty contextSchemaPath then "ContextSchemaPath static parameter has to be defined to use this function."
else "Schema location: " + contextSchemaPath + ". Write dot after this method to see the save result."
)
yield m :> MemberInfo
"If we could do it design time, the design time might have loaded already everything for the compilation?"
Will it? My tests showed that the cache contained all tables but columns for only tables referenced in the current scope. So if I have other source files referencing other tables their schema would not be saved. This is partly why I went this path - I have to put some code referencing all tables that I want to be cached in order to save the schema. So most likely I will need to write a small design-time script for schema cache saving.
An alternative to this would be to force saving the complete schema for the whole database. But in the comments above we sort of agreed that we don't want that.
Another (new) option might be defining a method for explicit referencing certain schema elements and saving them in the cache, e.g.:
ctx.ContextSchema(@"C:\Temp\Schema.txt", [ctx.Main.Customers; ctx.Main.Product; ctx.Main.Orders]).SaveNow
I don’t know how easy it’s to get other than the static parameters design-time, without referencing FSharp.Compiler.
--
From: Vagif Abilov Sent: 19 April 2018 18:57 To: fsprojects/SQLProvider Cc: Tuomas Hietanen; Mention Subject: Re: [fsprojects/SQLProvider] Caching of provided types (#178)
I've re-read one of your comments. Here is what you meant, right? dbContext.ContextSchema(@"C:\TEMP\Schema.txt").SaveNow.Error But can it be shorter? dbContext.ContextSchema(@"C:\TEMP\Schema.txt").Save.Now — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
”My tests showed that the cache contained all tables but columns for only tables referenced in the current scope”
What is the scope here? Does that mean that if you have 100 different queries, there are actually 100 different concurrent column caches? I think that is not what we want.
--
Sent from my Windows 10 phone
From: Vagif Abilov Sent: 19 April 2018 19:35 To: fsprojects/SQLProvider Cc: Tuomas Hietanen; Mention Subject: Re: [fsprojects/SQLProvider] Caching of provided types (#178)
"If we could do it design time, the design time might have loaded already everything for the compilation?" Will it? My tests showed that the cache contained all tables but columns for only tables referenced in the current scope. So if I have other source files referencing other tables their schema would not be saved. This is partly why I went this path - I have to put some code referencing all tables that I want to be cached in order to save the schema. So most likely I will need to write a small design-time script for schema cache saving. An alternative to this would be to force saving the complete schema for the whole database. But in the comments above we sort of agreed that we don't want that. Another (new) option might be defining a method for explicit referencing certain schema elements and saving them in the cache, e.g.: ctx.ContextSchema(@"C:\Temp\Schema.txt", [ctx.Main.Customers; ctx.Main.Product; ctx.Main.Orders]).SaveNow — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
No, there is only one schema cache per provider. So each query just adds to the context.
But we probably see two main usage scenarios: save a schema only for queries referenced in the context and save a schema for an application that uses many different queries in its different components, so it's hard to extract them all in one place, and then it's better to save the whole database schema.
What about ContextSchema and FullSchema with SaveNow design time method? And rename static parameter to SchemaCachePath.
Perhaps we should not give a developer a choice of selecting a filename, just schema folder path, so provider can store there files using internal names that will contain timestamp, as you suggested in your example.
I will be away until Tuesday, but will be checking the thread.
Created PR for you to do the saving at design-time. Try it, and let me know what you think. Works well for me, saves the whole used schema. So that you can use it to build the project.
What comes to stored procedures, I tried quickly this: Added a storage for sprocs:
and internal SchemaCache =
{ ...
Procedures : ResizeArray<Sproc>
IsOffline : bool }
with
static member Empty = {
...
Procedures = ResizeArray<Sproc>()
IsOffline = false }
Then just modified procedure calls from e.g. member __.GetSprocs(con) = MSSqlServer.connect con MSSqlServer.getSprocs
to:
member __.GetSprocs(con) = MSSqlServer.connect con MSSqlServer.getSprocs
if schemaCache.IsOffline then schemaCache.Procedures |> Seq.toList
else
let r = MSSqlServer.connect con MSSqlServer.getSprocs
schemaCache.Procedures.AddRange r
r
...but as expected, the DataContractJsonSerializer fails to serialize the type Sproc. It's a recursive f# type with functions.
I'm working with @tforkmann on a project and we try to use SQLProvider. Unfortunately I don't have access to the db so the type provider can't generate types on my machine. Is there a way for @tforkmann to commit "cached" types so that I can build on my machine and look at the code?