Closed epeters3 closed 4 years ago
Merging #204 into develop will increase coverage by
0.31%
. The diff coverage is31.25%
.
@@ Coverage Diff @@
## develop #204 +/- ##
===========================================
+ Coverage 55.12% 55.44% +0.31%
===========================================
Files 36 37 +1
Lines 2712 2709 -3
===========================================
+ Hits 1495 1502 +7
+ Misses 1217 1207 -10
Impacted Files | Coverage Δ | |
---|---|---|
dna/utils.py | 70.27% <12.50%> (-15.94%) |
:arrow_down: |
dna/models/base_models.py | 50.59% <16.66%> (+4.19%) |
:arrow_up: |
dna/data.py | 73.63% <33.33%> (+0.32%) |
:arrow_up: |
dna/models/torch_modules/dna_module.py | 94.02% <83.33%> (-1.14%) |
:arrow_down: |
dna/constants.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0046b04...ad33884. Read the comment docs.
Closing this PR in favor of a coming PR, which takes a different approach.
Parent issue: #205.
The pipelines in the D3M Metalearning Database use a large variety of primitives. Because of this, there is a chance that when making a train/test split out of a meta dataset curated from the D3M DB, the test split could reference primitives that don't exist in the training set (indeed I ran into this problem). This PR addresses this issue for the sequence models in the repo by including a vector in the one hot encoding primitive matrix for any
unknown
primitives (primitives that the model/data loader did not see in the fit/training phase).This PR also refactors out a common utility method for creating a primitive one-hot encoding matrix, reducing code duplication in the repo.