Open GoogleCodeExporter opened 9 years ago
We definitely need to check that early.
As users can write new FEs, and will usually have no overview about what
feature names are used so far, we need to inform the user about that.
Other solutions, e.g. prefixing each feature name with the name of the FE etc.
make the feature space much bigger. So I would not like to go in this direction.
Original comment by torsten....@gmail.com
on 21 Feb 2014 at 9:07
As Torsten pointed out, this problem may also arise across feature extractors.
Prefixing the names of the resp. FEs wouldn't fully solve the problem, as one
FE can return a list of Features (possibly named equally).
Original comment by daxenber...@gmail.com
on 21 Feb 2014 at 9:13
Why would prefixing the FE name make the feature space bigger? Wouldn't it only
make the feature names longer?
Original comment by richard.eckart
on 21 Feb 2014 at 9:17
Prefixes don't increase the number of features but only the space occupied for
their names (which should be insignificant with the latest version of the
feature store). Still, prefixes don't solve the problem (see comment #2).
Original comment by daxenber...@gmail.com
on 21 Feb 2014 at 9:33
But they reduce the risk. I would suppose that the risk to have a single FE
producing multiple features with the same name is much lower than having
different FEs producing conflicting features. Different FE implementers may not
know about each other, but within a single FE, it should be possible to handle
this by convention. It would also be possible to do an outside sanity check
when fetching the features from the FE. I think overriding equals() and
hashcode() may be a bit heavy. Although, if you did that, it would be easy
return a set instead of a list.
Original comment by richard.eckart
on 21 Feb 2014 at 9:52
There might also be some problems when using ARFFs with very long feature names
(not to speak about readability).
Regarding bigger feature space: I meant it makes the ARFF bigger, but that
might be neglectable as the header is only stored once anyway.
Original comment by torsten....@gmail.com
on 21 Feb 2014 at 10:11
Why are we returning a list of features instead a set of features anyway?
We need to preserve feature ordering with the new feature space, but we could
also ensure that by ordering the names alphabetically.
Original comment by torsten....@gmail.com
on 21 Feb 2014 at 10:13
Maintaining order in the store sounds like a good idea.
Original comment by richard.eckart
on 21 Feb 2014 at 10:18
Original comment by torsten....@gmail.com
on 25 Apr 2014 at 8:25
Using treeset in feature store instead of relying on implicit ordering in list.
Throw an exception when the same feature name is used twice.
This should solve the issue, but actually throws the exception pretty late.
Catching this earlier would involve changing the signature of feature
extractors, which I do not want to do at the moment.
Leaving the issue open for now, as this is not fully resolved.
Original comment by torsten....@gmail.com
on 25 Apr 2014 at 8:34
Original comment by daxenber...@gmail.com
on 4 Jun 2014 at 3:46
Original comment by daxenber...@gmail.com
on 29 Aug 2014 at 10:11
Original issue reported on code.google.com by
daxenber...@gmail.com
on 21 Feb 2014 at 8:57