Closed ArnaudAbreu closed 3 years ago
Merging #28 (aa88e29) into master (7fc4b6f) will decrease coverage by
7.43%
. The diff coverage is1.80%
.
@@ Coverage Diff @@
## master #28 +/- ##
==========================================
- Coverage 38.15% 30.71% -7.44%
==========================================
Files 15 18 +3
Lines 865 1084 +219
==========================================
+ Hits 330 333 +3
- Misses 535 751 +216
Flag | Coverage Δ | |
---|---|---|
unittests | 30.71% <1.80%> (-7.44%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
pathaia/datasets/__init__.py | 0.00% <0.00%> (ø) |
|
pathaia/datasets/data.py | 0.00% <ø> (ø) |
|
pathaia/datasets/errors.py | 0.00% <0.00%> (ø) |
|
pathaia/datasets/functional_api.py | 0.00% <0.00%> (ø) |
|
pathaia/patches/functional_api.py | 45.84% <ø> (ø) |
|
pathaia/util/management.py | 0.00% <0.00%> (ø) |
|
pathaia/util/types.py | 97.56% <100.00%> (+0.09%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 7fc4b6f...aa88e29. Read the comment docs.
I think we also need common ways to build datasets and to make them as reliable as possible as many ML mistakes in our projects come from the building of these structures.
Start the 'datasets' subpackage. 'DataSets' types are provided. Basically, we add a
RefDataSet
struct that is a tuple (x, y) where x is a list of samples and y is the corresponding list of labels. We provide afunctional_api
inpathaia.datasets
that includes the following features:RefDataSet
and return aRefDataSet
to shuffle, clean, balance, clip, batch a dataset,RefDataSet
and return aSequence[RefDataSet]
to split datasets in as many subsets as requested (even named subsets) while preserving the class ratio in the data,I have a test-script for the features described above that I will add to the testing procedures of PathAIA in an other PR.
Here is a code sample on how dataset decorators can be used to yield samples:
Of course, the use of the decorators is only optional, you can as well call the corresponding functions. I guess it depends on the case. But in many of my scripts, I find that having that type of syntax is helpful.
object_api
will come withCohort
objects to wrap all these functions.