epeters3 commented 4 years ago

Parent issue: #205.

The pipelines in the D3M Metalearning Database use a large variety of primitives. Because of this, there is a chance that when making a train/test split out of a meta dataset curated from the D3M DB, the test split could reference primitives that don't exist in the training set (indeed I ran into this problem). This PR addresses this issue for the sequence models in the repo by including a vector in the one hot encoding primitive matrix for any unknown primitives (primitives that the model/data loader did not see in the fit/training phase).

This PR also refactors out a common utility method for creating a primitive one-hot encoding matrix, reducing code duplication in the repo.

codecov-commenter commented 4 years ago

Codecov Report

Merging #204 into develop will increase coverage by 0.31%. The diff coverage is 31.25%.

@@             Coverage Diff             @@
##           develop     #204      +/-   ##
===========================================
+ Coverage    55.12%   55.44%   +0.31%     
===========================================
  Files           36       37       +1     
  Lines         2712     2709       -3     
===========================================
+ Hits          1495     1502       +7     
+ Misses        1217     1207      -10

Impacted Files	Coverage Δ
dna/utils.py	`70.27% <12.50%> (-15.94%)`	:arrow_down:
dna/models/base_models.py	`50.59% <16.66%> (+4.19%)`	:arrow_up:
dna/data.py	`73.63% <33.33%> (+0.32%)`	:arrow_up:
dna/models/torch_modules/dna_module.py	`94.02% <83.33%> (-1.14%)`	:arrow_down:
dna/constants.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0046b04...ad33884. Read the comment docs.

epeters3 commented 4 years ago

Closing this PR in favor of a coming PR, which takes a different approach.

byu-dml / d3m-dynamic-neural-architecture

One hot encoding of primitives should support unseen primitives #204

Codecov Report