datatonic / duke

Automatically exported from code.google.com/p/duke
0 stars 0 forks source link

Allow user configuration of lookup properties #65

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Lookup properties are derived via a calculation 
(no.priv.garshol.duke.Configuration.findLookupProperties) using the 
probabilities that a successful comparison will result in a matching record. 
The property element in the XML config file could support an attribute to allow 
the user to specify that a property should be used as a lookup. Also, 
properties defined with an ExactComparator could be preferred as lookup 
properties.

Original issue reported on code.google.com by nicky.ni...@gmail.com on 5 Jan 2012 at 11:21

GoogleCodeExporter commented 8 years ago
We could certainly do this, but I'm not sure if it's worth it. It would add 
some flexibility, at the cost of some conceptual complexity.

Why do you think this would be useful?

Original comment by lar...@gmail.com on 5 Jan 2012 at 11:25

GoogleCodeExporter commented 8 years ago

Original comment by lar...@gmail.com on 5 Jan 2012 at 11:25

GoogleCodeExporter commented 8 years ago
The lookup defines the set of records in which a match may be found, so 
wouldn't it be helpful to allow the user to define this? I've seen reasonable 
performance on a shortened test file, but in some cases this degrades 
dramatically on a real world scale example due to the number of possibles 
returned from a lookup. 

Original comment by nicky.ni...@gmail.com on 5 Jan 2012 at 11:53

GoogleCodeExporter commented 8 years ago
If you're worried about the number of records to match against (after 
performing search), perhaps we could instead add a limit to the number of 
records we match against? That way, Duke can try to be clever about choosing 
the right records, without involving the user too much in exactly how it 
happens.

What do you think?

Original comment by lar...@gmail.com on 5 Jan 2012 at 12:06

GoogleCodeExporter commented 8 years ago
I guess I was thinking that the specification that a property should be used as 
a lookup value would not be mandatory, just an optional override provided by 
some users. So you could configure duke as normal and allow it to compute which 
properties are used for lookups, or if duke sees that one or more properties 
have the attribute useAsLookup=true, then it does not do the calculation to 
select lookups, but instead sets the lookups as specified by the users 
configured choices.

Original comment by nicky.ni...@gmail.com on 5 Jan 2012 at 12:54

GoogleCodeExporter commented 8 years ago
I agree that if we do allow lookup properties to be configured what you 
describe would be the way to do it. But why would you do it this way instead of 
specifying the max number of matches? (I'm not saying you're wrong, I'm just 
trying to understand your thinking better.)

Original comment by lar...@gmail.com on 5 Jan 2012 at 12:59

GoogleCodeExporter commented 8 years ago
Turns out we're going to need this, after all.

Original comment by lar...@gmail.com on 14 Jan 2013 at 1:05

GoogleCodeExporter commented 8 years ago
Now implemented and tests added.

Original comment by lar...@gmail.com on 17 Jan 2013 at 8:36