brekelma / dsbox_corex

Apache License 2.0
0 stars 0 forks source link

Corex Text does not add column metadata #6

Open kyao opened 5 years ago

kyao commented 5 years ago

The d3m.primitives.feature_construction.corex_text.CorexText does not add column metadata. This causes wrapped SKLearn primitives to crash.

Pdb) primitives_outputs['steps.4.produce'].shape
(771, 29)
(Pdb) primitives_outputs['steps.4.produce'].head()
  d3mIndex       Player Number_seasons Games_played At_bats       ...        Position_nan Hall_of_Fame_1 Hall_of_Fame_0 Hall_of_Fame_2 Hall_of_Fame_nan
0        0   HANK_AARON             23         3298   12364       ...                   0              1              0              0                0
1        1  JERRY_ADAIR             13         1165    4019       ...                   0              0              1              0                0
2        4   JOE_ADCOCK             17         1959    6606       ...                   0              0              1              0                0
3        5  TOMMIE_AGEE             12         1129    3912       ...                   0              0              1              0                0
4        6  LUIS_AGUAYO             10          568    1104       ...                   0              0              1              0                0

[5 rows x 29 columns]
(Pdb) primitives_outputs['steps.5.produce'].shape
(771, 33)
(Pdb) primitives_outputs['steps.5.produce'].head()
  d3mIndex Number_seasons Games_played At_bats  Runs  Hits Doubles    ...    Hall_of_Fame_2 Hall_of_Fame_nan   corex_0   corex_1   corex_2   corex_3   corex_4
0        0             23         3298   12364  2174  3771     624    ...                 0                0  0.348001  0.496024  0.494908  0.399449  0.515824
1        1             13         1165    4019   378  1022     163    ...                 0                0  0.448118  0.911324  0.259670  0.250383  0.492069
2        4             17         1959    6606   823  1832     295    ...                 0                0  0.216636  0.308273  0.448800  0.499551  0.386128
3        5             12         1129    3912   558   999     170    ...                 0                0  0.514805  0.175734  0.741300  0.566758  0.526204
4        6             10          568    1104   142   260      43    ...                 0                0  0.445903  0.638172  0.489961  0.506785  0.222282

[5 rows x 33 columns]
(Pdb) primitives_outputs['steps.5.produce'].metadata.query((metadata_base.ALL_ELEMENTS, 27))
<FrozenOrderedDict OrderedDict([('structural_type', <class 'int'>), ('semantic_types', ('http://schema.org/Integer', 'https://metadata.datadrivendiscovery.org/types/Attribute'))])>
(Pdb) primitives_outputs['steps.5.produce'].metadata.query((metadata_base.ALL_ELEMENTS, 28))
<FrozenOrderedDict OrderedDict()>
(Pdb) primitives_outputs['steps.5.produce'].metadata.query((metadata_base.ALL_ELEMENTS, 29))
<FrozenOrderedDict OrderedDict()>
(Pdb) primitives_outputs['steps.5.produce'].metadata.query((metadata_base.ALL_ELEMENTS, ))
<FrozenOrderedDict OrderedDict([('dimension', <FrozenOrderedDict OrderedDict([('name', 'columns'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularColumn',)), ('length', 33)])>)])>