Closed jh83775 closed 1 year ago
Merging #108 (d85ceb1) into master (80f5319) will increase coverage by
0.28%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #108 +/- ##
==========================================
+ Coverage 90.85% 91.13% +0.28%
==========================================
Files 32 34 +2
Lines 4649 4774 +125
==========================================
+ Hits 4224 4351 +127
+ Misses 425 423 -2
Impacted Files | Coverage Δ | |
---|---|---|
vectorizers/__init__.py | 100.00% <100.00%> (ø) |
|
vectorizers/signature_vectorizer.py | 100.00% <100.00%> (ø) |
|
vectorizers/tests/test_signature_vectorizer.py | 100.00% <100.00%> (ø) |
|
vectorizers/preprocessing.py | 88.51% <0.00%> (+0.95%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
This looks like a good addition. Would you be willing to add some basic tests, and possibly even a tutorial guide for the documentation?
Also is it worth adding iisignature to the requirements?
No problem, I should have some time over the next few weeks to add tests and tutorial guides.
Happy to add iisignature to the requirements too - the current implementation was in case we wanted to avoid having too many dependencies, but if that's not an issue I'll add it in directly.
Thanks -- I know the tests and documentation are a lot of extra work, but they are very valuable for maintenance, and for ensuring users can find and use the feature, so it will definitely be worth the effort in the long run.
As far as dependencies go ... I haven't checked what iisignature requires. If it is reasonably self contained then one extra dependency is probably fine for now.
I have added an example notebook and some tests for the SignatureVectorizer. I'll add more functionality (e.g. common path transforms) and more examples in the future too, but this should suffice as a good starting point.
I had some trouble getting the automated tests to work. For some reason including iisignature in the requirements file broke something (error messages suggested it was being installed before numpy and thus breaking since numpy is a dependancy?). I've commented out iisignature in the requirements.txt for now, and added it explicitly to azure-pipelines.yml, which seems to work. Do you have any suggestions? I suspect there might be something trivial which I've missed.
Edit: Seems like iisignature imports numpy in its setup.py file, and this was causing problems in the automated test pipelines. I've re-added iisignature to requirements.txt and put an explicit pip install numpy
in azure-pipelines.yml just before the pip install -r requirements.txt
, just so its definitely present on the system before that call. I think this resolves the issue I was having with minimal changes to the existing code.
Thanks for this -- as I say it should be very valuable for a great many people and is a great complement to the techniques we had so far. I'm looking forward to putting this to use myself.
I've implemented SignatureVectorizer, which returns the path signatures for a collection of paths.
This vectorizer essentially wraps the iisignature package such that it fits into the standard sklearn style fit_transform pipeline. While it does require iisignature, the imports are written such that the rest of the library can still be used if the user does not have iisignature installed.
For more details on the path signature technique, I've found this paper quite instructive: A Primer on the Signature Method in Machine Learning (Chevyrev, I.)