Open GoogleCodeExporter opened 9 years ago
So the core issue is that any SemanticSpace needs information on:
1. what tokens it should count as features
2. what tokens it should keep representations for
The BasisMapping provides the first one, and the Filterable interface supports
the second. However, we don't have any way of building an input representation
to the SemanticSpace that supports both. Is that the case?
Could we roll this into the SemanticSpace Document API change, where a Document
would now handle to the tokenization? Also, how would this work for the
dependency-tree based SemanticSpace implementations, or is this a co-occurrence
based issue only?
Original comment by David.Ju...@gmail.com
on 25 Aug 2011 at 9:21
Correct. We do have the tools to make this work, and giving both to the
SemanticSpaces is a reasonable solution to me. I don't think we need, but
having one would always be awesome, a unified way of handling both these
issues, as long as they both get handled in a reasonable manner.
I do think that this would be a pretty large overhaul and fits in nicely with
the api change. For the dependency tree models, we'd still need the Filterable
feature to handle which words needs a representation and I would think we could
to pass the feature list as a DependencyPathAcceptor, i.e. one that accepts
only things in the feature list, and then let the dependency tree models do
their mapping however they wish with those accepted paths.
Original comment by FozzietheBeat@gmail.com
on 25 Aug 2011 at 9:39
Original issue reported on code.google.com by
FozzietheBeat@gmail.com
on 25 Aug 2011 at 8:15