IDENTITY column support

GoogleCodeExporter commented 9 years ago

I have a good idea on how to support them. We must create a special type of 
key generator for it (IdentityKeyGenerator). It won't create any additional 
tables, but will rely on original table instead. When it needs a bulk of 
keys, it runs a transaction inserting e.g. 25...100 records (with default 
values) there, getting their keys and removing all of them. This action isn't 
visible outside of this transaction, but it allows us to get a bulk of keys 
to distribute.

The main benefit of this approach is that we don't need to modify our 
existing key distribution schema in this case - currently we assign Key of 
each entity on its creation, and that's really convenient.

Original issue reported on code.google.com by alex.yakunin on 24 Sep 2009 at 10:44

GoogleCodeExporter commented 9 years ago

I will be happy to see this feature working, i want exactly IdentityKey.

Original comment by psu...@gmail.com on 9 Dec 2009 at 10:03

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

I've just got a brilliant idea on how to implement this inside our current 
model. The problem we have now is that we must insert some _valid_ row to a 
table with IDENTITY column to get a subsequent @@IDENTITY value. But this is 
difficult: there are constraints and indexes that must be satisfied, so it is 
virtually impossible to insert such a row.

But must we really insert a _valid_ row? May be _invalid_ is acceptable as 
well? Actually, yes. The test is:

CREATE TABLE [dbo].[IdTest](
 [Id] [int] IDENTITY(1,1) NOT NULL,
 [Value] [int] NOT NULL,
 CONSTRAINT [PK_IdTest] PRIMARY KEY CLUSTERED ( [Id] ASC )
)

With this table, we can use the following batch to get each subsequent Id and 
cache it:

BEGIN TRY
  insert into [IdTest] ([Value])
  values (1E100) -- Too big for int. Null also can be used here.
END TRY
BEGIN CATCH
END CATCH
select IDENT_CURRENT( 'IdTest' ) as Id -- Returns always new IDENTITY value!
select COUNT(*) from [IdTest] -- Always returns 0 - i.e. actually there are no 
insertions.

So the trick is:
- Failing INSERT leads to increment of IDENTITY column anyway (but only if 
IDENTITY isn't explicitly specified there)
- It's easy to make any statement fail: we must simply insert an out-of-range 
value to at least one column (violation of non-null constraint is the simplest 
case)
- If this is impossible - ok, let's roll back the transaction generating a 
batch of identifiers (or, better, we'll do this anyway).

Thus with this code it doesn't matter if insert fails or not. We're getting a 
subsequent identifier each time this code is executed.

Original comment by alex.yakunin on 29 Jun 2010 at 3:27

GoogleCodeExporter commented 9 years ago

And the final solution is simple: we must create KeyGenerator supporting this 
technique.

Original comment by alex.yakunin on 29 Jun 2010 at 3:28

GoogleCodeExporter commented 9 years ago

Might be implemented together with this one: issue 250, issue 780

Original comment by alex.yakunin on 17 Aug 2010 at 5:09

GoogleCodeExporter commented 9 years ago

The last solution does not work as it is not thread-safe as documented on MSDN: 
http://msdn.microsoft.com/en-us/library/ms175098.aspx.

Original comment by abiturun...@gmail.com on 17 Aug 2010 at 8:56

GoogleCodeExporter commented 9 years ago

Hmm, you're right - just noticed it returns a value shared across all the 
sessions.

It seems this problem can be solved nicely:

DO4 generates keys in bulks using a single thread & Session for this. So if it 
is the only key consumer, there is no problem at all.

But what can be done if there are other key consumers? If there can be just 
other DO4 Domains, we can make its IDENTITY-based key generators to e.g. lock 
some special DB object for the whole key generation attempt. 

But let's imagine there can be regular clients relying on IDENTITY column. Can 
we implement an algorithm that ensures both we and such clients get only unique 
keys? 

Actually, yes, and it seems I know how to implement it even without locking :)

We allocate keys in bulks. In fact, we always get a sequence of numbers there: 
23,24,25,26,... And to ensure we don't consume a key, that was shared, we must 
simply skip a number following after each gap here: 
- if we get 1,2,3,5,6,7 sequence, we take only 1,2,3,6,7 keys from it. 
- if we get 1,2,3,5,8,9,10 sequence, take only 1,2,3,9,10 keys from it. 

We skip it, because it is a "potentially shared" key.

This should work pretty fast, since we'll mostly do this in bulks. 

The same rule must be applied for the first key in each bulk. I.e. if the last 
key in prev. bulk was 128, and the first one in the current bulk is 130, we 
must skip it. We also should skip the first key generated by this way (i.e. we 
consider there is a gap in beginning of our sequence).

Why this must work: let's imagine we have multiple programs running this code 
concurrently:

1) increment counter
2) read counter
3) increment counter
4) read counter
...

The algorithm ensures:
1) no one else has incremented the counter between steps 2 & 4 except us (since 
we know there was just one increment, and since we did it, it was our own 
increment). 
2) no one else will use the key generated on step 4. But only if all clients 
use either safe way of key generation (e.g. relying on @@IDENTITY), or use 
similar algorithm. That's because read operation follows after increment in 
each thread, but we know there were no "foreign" increments between steps 2 & 4.

What do you think about this? ;)

Original comment by alex.yakunin on 17 Aug 2010 at 10:46

GoogleCodeExporter commented 9 years ago

Btw, I tried to use @@IDENTITY and SCOPE_IDENTITY() in above scripts - no one 
works. MSSQL really offers a practical example of concurrent counter doesn't 
support "Increment and Read" operation, but supports only "Incrememt" and 
"Read" :)

Original comment by alex.yakunin on 17 Aug 2010 at 11:13

GoogleCodeExporter commented 9 years ago

See also: http://blog.alexyakunin.com/2010/08/concurrency-problem.html

Original comment by Dmitri.Maximov on 17 Aug 2010 at 11:17

GoogleCodeExporter commented 9 years ago

I am not sure how you would later insert a record with a pre-generated key as 
SQL Server only allows explicit identity insert if the IDENTITY_INSERT option 
is on. It is a global option for only one table at the same time so it cannot 
be used with multiple writers (likely in an asp.net "web-garden" oder multiple 
servers sharing the same db).

Does DO really have to have the key before the insert happens? I would be 
totally happy with an entity whose ID property just returns 0 or better throws 
an exception (until I call persist). The above solution seems a little hacky to 
me. I personally try to stay away from anything non-standard because 
non-standard things are likely to cause trouble. And causing SQL exceptions all 
over the place to get an unused identity value looks funny to me ;-)

Original comment by abiturun...@gmail.com on 17 Aug 2010 at 4:52

GoogleCodeExporter commented 9 years ago

Sorry, just noticed the comment.

> Does DO really have to have the key before the insert happens?

Actually, yes - it's one of its design concepts. Our logic was the following:

We must allow developers to build very complex systems on DO. In particular, 
based on events and other various ways of action interception.

To minimize the interference (coupling with other code) of such "attached" 
logic, we must ensure they won't affect on it. That's why we support automatic 
change tracking and persistence: your code normally should care about the code 
it invokes only if it leads to some side effects related to entities it deals 
with (i.e. everything is the same as with regular code).

But obviously, such external code can establish references to your entities 
from its own ones. This means, it might need the key - otherwise we won't be 
able to persist such when it will be necessary. And that's really bad: the 
whole good idea of transparent persistence gets broken because of this. No 
persistence transparently = much more painful design of complex systems, more 
coupled code and so on. Bad.

Btw, the case isn't rare - think about e.g. ACLs and "attached security", or 
even simple audit logging. 

Now let's turn back to the problem. We could:
- Add support for some temporary keys that gets replaced by actual ones on 
persist. Obviously, we should ensure any temporary key is always equal to its 
"actual" pair - even this is complex, since there can be a number of instances 
of such keys, and we should maintain references to all of them until the 
association with actual key. Ay code dealing with entities should check 
"temporary" flag. And finally, temporary key anyway does not allow to persist a 
reference to its owner - actual key is necessary to do this. So there is a lot 
of complexity behind this option.
- On the other hand, we could simply agree that we associate an Entity with key 
right on creation. That's fully acceptable from the point of performance, if 
async bulk key allocation is implemented, and actually, quite convenient.

So we decided to use the second option. But further we partially implemented 
the first one: there are temporary keys (see Key.IsTemporary), but they're used 
only with DisconnectedState, and their usage there is actually more similar to 
usage of regular keys. They're replaced by actual keys on ApplyChanges(), 
that's fully acceptable. There is even a special API for this (look for 
RemapKeys in help).

Ok, I explained why we initially decided to avoid unassigned or temporary keys. 
Now let's turn to other questions:

> I am not sure how you would later insert a record with a pre-generated key as 
SQL Server only allows explicit identity insert if the IDENTITY_INSERT option 
is on.

True, in this case it will be necessary to turn it off - that's the only way. 
But hopefully, this compromise is acceptable in 90% of cases.

> I would be totally happy with an entity whose ID property just returns 0 or 
better throws an exception (until I call persist).

This could work only w/o automatic persist. DisconnectedState fully blocks 
persist (more precisely, it actually caches all the changes inside it), but it 
doesn't help to solve the problem with IDENTITY.

> The above solution seems a little hacky to me.

That's absolutely true - it's a sort of hack around the limitation.

> I personally try to stay away from anything non-standard because non-standard 
things are likely to cause trouble.

Correct... Well, in this case that's a kind of trade: either you get greater 
simplicity while writing BLL code (it's really quite convenient to rely on fact 
that key is _always_ available and stays unchanged), or you get "true support" 
of rudimentary IDENTITY columns (i.e. w/o such hacks).

Also note that DO is primarily designed to be used in non-legacy mode. We look 
on Legacy mode as on intermediate stage of migration: we must support as much 
as possible here to allow people to use all the features of DO, but finally, 
it's less convenient.

So in this case we trade "true support" of one of important features of Legacy 
mode to convenience in "default" mode.

IMO, IDENTITY columns are one of the worst usability-related inventions - it's 
simply a design failure. Nearly any other RDBMS offers generators - a much more 
convenient and scalable solution, especially in case with big applications. But 
guys from Microsoft invented this "more convenient" sh$t, and STILL don't offer 
sequences, that are PART OF SQL-92 STANDARD: http://www.ocelot.ca/commands.htm

So it's not surprising Wikipedia mentions only Microsoft SQL Server and 
Microsoft Access as databases supporting identity columns: 
http://en.wikipedia.org/wiki/Identity_column

P.S. I know, it sounds like "I simply hate this crazy feature, and that's the 
only true reason we decided to trade it for real convenience." And I know, 
probably this is not a business-wise decision. But I really care about 
convenience, and trading so much for crazy and purely Microsoft-specific 
feature seems to high cost. IMO, the stuff like IDENTITY must be either 
boycotted at all, or supported via the same "natural and standard" way as its 
inventors provided for us (Russians say, "through the ass"). Sorry :(

Original comment by alex.yakunin on 8 Oct 2010 at 12:28

GoogleCodeExporter commented 9 years ago

Sorry, many mistakes - e.g. "too high cost". But the idea is clear ;)

Original comment by alex.yakunin on 8 Oct 2010 at 12:33

google-code-export / dataobjectsdotnet

IDENTITY column support #415