Open GoogleCodeExporter opened 9 years ago
I will be happy to see this feature working, i want exactly IdentityKey.
Original comment by psu...@gmail.com
on 9 Dec 2009 at 10:03
[deleted comment]
[deleted comment]
I've just got a brilliant idea on how to implement this inside our current
model. The problem we have now is that we must insert some _valid_ row to a
table with IDENTITY column to get a subsequent @@IDENTITY value. But this is
difficult: there are constraints and indexes that must be satisfied, so it is
virtually impossible to insert such a row.
But must we really insert a _valid_ row? May be _invalid_ is acceptable as
well? Actually, yes. The test is:
CREATE TABLE [dbo].[IdTest](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [int] NOT NULL,
CONSTRAINT [PK_IdTest] PRIMARY KEY CLUSTERED ( [Id] ASC )
)
With this table, we can use the following batch to get each subsequent Id and
cache it:
BEGIN TRY
insert into [IdTest] ([Value])
values (1E100) -- Too big for int. Null also can be used here.
END TRY
BEGIN CATCH
END CATCH
select IDENT_CURRENT( 'IdTest' ) as Id -- Returns always new IDENTITY value!
select COUNT(*) from [IdTest] -- Always returns 0 - i.e. actually there are no
insertions.
So the trick is:
- Failing INSERT leads to increment of IDENTITY column anyway (but only if
IDENTITY isn't explicitly specified there)
- It's easy to make any statement fail: we must simply insert an out-of-range
value to at least one column (violation of non-null constraint is the simplest
case)
- If this is impossible - ok, let's roll back the transaction generating a
batch of identifiers (or, better, we'll do this anyway).
Thus with this code it doesn't matter if insert fails or not. We're getting a
subsequent identifier each time this code is executed.
Original comment by alex.yakunin
on 29 Jun 2010 at 3:27
And the final solution is simple: we must create KeyGenerator supporting this
technique.
Original comment by alex.yakunin
on 29 Jun 2010 at 3:28
Might be implemented together with this one: issue 250, issue 780
Original comment by alex.yakunin
on 17 Aug 2010 at 5:09
The last solution does not work as it is not thread-safe as documented on MSDN:
http://msdn.microsoft.com/en-us/library/ms175098.aspx.
Original comment by abiturun...@gmail.com
on 17 Aug 2010 at 8:56
Hmm, you're right - just noticed it returns a value shared across all the
sessions.
It seems this problem can be solved nicely:
DO4 generates keys in bulks using a single thread & Session for this. So if it
is the only key consumer, there is no problem at all.
But what can be done if there are other key consumers? If there can be just
other DO4 Domains, we can make its IDENTITY-based key generators to e.g. lock
some special DB object for the whole key generation attempt.
But let's imagine there can be regular clients relying on IDENTITY column. Can
we implement an algorithm that ensures both we and such clients get only unique
keys?
Actually, yes, and it seems I know how to implement it even without locking :)
We allocate keys in bulks. In fact, we always get a sequence of numbers there:
23,24,25,26,... And to ensure we don't consume a key, that was shared, we must
simply skip a number following after each gap here:
- if we get 1,2,3,5,6,7 sequence, we take only 1,2,3,6,7 keys from it.
- if we get 1,2,3,5,8,9,10 sequence, take only 1,2,3,9,10 keys from it.
We skip it, because it is a "potentially shared" key.
This should work pretty fast, since we'll mostly do this in bulks.
The same rule must be applied for the first key in each bulk. I.e. if the last
key in prev. bulk was 128, and the first one in the current bulk is 130, we
must skip it. We also should skip the first key generated by this way (i.e. we
consider there is a gap in beginning of our sequence).
Why this must work: let's imagine we have multiple programs running this code
concurrently:
1) increment counter
2) read counter
3) increment counter
4) read counter
...
The algorithm ensures:
1) no one else has incremented the counter between steps 2 & 4 except us (since
we know there was just one increment, and since we did it, it was our own
increment).
2) no one else will use the key generated on step 4. But only if all clients
use either safe way of key generation (e.g. relying on @@IDENTITY), or use
similar algorithm. That's because read operation follows after increment in
each thread, but we know there were no "foreign" increments between steps 2 & 4.
What do you think about this? ;)
Original comment by alex.yakunin
on 17 Aug 2010 at 10:46
Btw, I tried to use @@IDENTITY and SCOPE_IDENTITY() in above scripts - no one
works. MSSQL really offers a practical example of concurrent counter doesn't
support "Increment and Read" operation, but supports only "Incrememt" and
"Read" :)
Original comment by alex.yakunin
on 17 Aug 2010 at 11:13
See also: http://blog.alexyakunin.com/2010/08/concurrency-problem.html
Original comment by Dmitri.Maximov
on 17 Aug 2010 at 11:17
I am not sure how you would later insert a record with a pre-generated key as
SQL Server only allows explicit identity insert if the IDENTITY_INSERT option
is on. It is a global option for only one table at the same time so it cannot
be used with multiple writers (likely in an asp.net "web-garden" oder multiple
servers sharing the same db).
Does DO really have to have the key before the insert happens? I would be
totally happy with an entity whose ID property just returns 0 or better throws
an exception (until I call persist). The above solution seems a little hacky to
me. I personally try to stay away from anything non-standard because
non-standard things are likely to cause trouble. And causing SQL exceptions all
over the place to get an unused identity value looks funny to me ;-)
Original comment by abiturun...@gmail.com
on 17 Aug 2010 at 4:52
Sorry, just noticed the comment.
> Does DO really have to have the key before the insert happens?
Actually, yes - it's one of its design concepts. Our logic was the following:
We must allow developers to build very complex systems on DO. In particular,
based on events and other various ways of action interception.
To minimize the interference (coupling with other code) of such "attached"
logic, we must ensure they won't affect on it. That's why we support automatic
change tracking and persistence: your code normally should care about the code
it invokes only if it leads to some side effects related to entities it deals
with (i.e. everything is the same as with regular code).
But obviously, such external code can establish references to your entities
from its own ones. This means, it might need the key - otherwise we won't be
able to persist such when it will be necessary. And that's really bad: the
whole good idea of transparent persistence gets broken because of this. No
persistence transparently = much more painful design of complex systems, more
coupled code and so on. Bad.
Btw, the case isn't rare - think about e.g. ACLs and "attached security", or
even simple audit logging.
Now let's turn back to the problem. We could:
- Add support for some temporary keys that gets replaced by actual ones on
persist. Obviously, we should ensure any temporary key is always equal to its
"actual" pair - even this is complex, since there can be a number of instances
of such keys, and we should maintain references to all of them until the
association with actual key. Ay code dealing with entities should check
"temporary" flag. And finally, temporary key anyway does not allow to persist a
reference to its owner - actual key is necessary to do this. So there is a lot
of complexity behind this option.
- On the other hand, we could simply agree that we associate an Entity with key
right on creation. That's fully acceptable from the point of performance, if
async bulk key allocation is implemented, and actually, quite convenient.
So we decided to use the second option. But further we partially implemented
the first one: there are temporary keys (see Key.IsTemporary), but they're used
only with DisconnectedState, and their usage there is actually more similar to
usage of regular keys. They're replaced by actual keys on ApplyChanges(),
that's fully acceptable. There is even a special API for this (look for
RemapKeys in help).
Ok, I explained why we initially decided to avoid unassigned or temporary keys.
Now let's turn to other questions:
> I am not sure how you would later insert a record with a pre-generated key as
SQL Server only allows explicit identity insert if the IDENTITY_INSERT option
is on.
True, in this case it will be necessary to turn it off - that's the only way.
But hopefully, this compromise is acceptable in 90% of cases.
> I would be totally happy with an entity whose ID property just returns 0 or
better throws an exception (until I call persist).
This could work only w/o automatic persist. DisconnectedState fully blocks
persist (more precisely, it actually caches all the changes inside it), but it
doesn't help to solve the problem with IDENTITY.
> The above solution seems a little hacky to me.
That's absolutely true - it's a sort of hack around the limitation.
> I personally try to stay away from anything non-standard because non-standard
things are likely to cause trouble.
Correct... Well, in this case that's a kind of trade: either you get greater
simplicity while writing BLL code (it's really quite convenient to rely on fact
that key is _always_ available and stays unchanged), or you get "true support"
of rudimentary IDENTITY columns (i.e. w/o such hacks).
Also note that DO is primarily designed to be used in non-legacy mode. We look
on Legacy mode as on intermediate stage of migration: we must support as much
as possible here to allow people to use all the features of DO, but finally,
it's less convenient.
So in this case we trade "true support" of one of important features of Legacy
mode to convenience in "default" mode.
IMO, IDENTITY columns are one of the worst usability-related inventions - it's
simply a design failure. Nearly any other RDBMS offers generators - a much more
convenient and scalable solution, especially in case with big applications. But
guys from Microsoft invented this "more convenient" sh$t, and STILL don't offer
sequences, that are PART OF SQL-92 STANDARD: http://www.ocelot.ca/commands.htm
So it's not surprising Wikipedia mentions only Microsoft SQL Server and
Microsoft Access as databases supporting identity columns:
http://en.wikipedia.org/wiki/Identity_column
P.S. I know, it sounds like "I simply hate this crazy feature, and that's the
only true reason we decided to trade it for real convenience." And I know,
probably this is not a business-wise decision. But I really care about
convenience, and trading so much for crazy and purely Microsoft-specific
feature seems to high cost. IMO, the stuff like IDENTITY must be either
boycotted at all, or supported via the same "natural and standard" way as its
inventors provided for us (Russians say, "through the ass"). Sorry :(
Original comment by alex.yakunin
on 8 Oct 2010 at 12:28
Sorry, many mistakes - e.g. "too high cost". But the idea is clear ;)
Original comment by alex.yakunin
on 8 Oct 2010 at 12:33
Original issue reported on code.google.com by
alex.yakunin
on 24 Sep 2009 at 10:44