Closed beijbom closed 4 years ago
@StephenChan : FYI, I have a proof-of-concept converter for the old sklearn models. So I have good hope we can use scikit-learn==0.22.1
which means any python version >= 3.5 will be supported.
(build still not working for 3.7 and 3.8, but it's unrelated, I'll fix soon.)
Cool, very nice to see this!
An update on this PR guys. I wrote extensive regression tests to compare to the current deployed version. The details are in
I downloaded 10 images from 5 different sources, along with their legacy extracted feature and also their classification scores. Each image has between 10 and 200 (row, col) locations. By running the tasks using the new code I can check that feature extraction and classification are backwards compatible.
Alas, while the classification step was flawless, I found some problems with the feature extraction. I have attaced extensive comparison data. These should be read as:
sourceid/imageid (row, col) 2-norm for (legacy_features(row, col) - new_features(row, col))
For example
s603/i576858 (1, 121) 0.0
s603/i576858 (32, 23) 0.00132
Means that for image i576858
from source s603
, features extracted at row, col = (1, 121)
matches exactly between new and legacy. but for row, col = (32, 23)
they don't.
As you can see below, some features match exactly. For two sources (1388 and 812), there are no perfect matches. But for the rest there are mixed results. So even for the same image, features from some locations match perfectly and some do not. Btw. When I first coded this up I had some simple reg. tests using source 16. For some reason that one has very few problems, so I didn't catch this until now.
The good news is that if I use these different features with the same classifier I actually get very similar results. Only in 2 (two) total locations over this whole set does the prediction differ, and even in those cases are the scores very similar. So worst case I think we can live with this.
Here are some options for what can be wrong:
/
is interpreted for integer division?The only thing I can think of next is to store the pre-processed patch right before it is given to the network and compare those. That will inform whether Caffe is wrong. If that is the case, we will have to leave this error in. I can't fix that. If there is a problem with pre-processing I'll find it.
Please do let me know if you have any thoughts or suggestions @StephenChan @qiminchen @kriegman feat_norms.log
So you are comparing features extracted on the current production server vs. features extracted with pyspacer + sklearn 0.22.1? If so, did you also compare with pyspacer + sklearn 0.17.1?
So you are comparing features extracted on the current production server vs. features extracted with pyspacer + sklearn 0.22.1?
sklearn
is not involved in the feature extraction. It is only the classification step, which passed the tests.
Oh, right. I don't have anything off the top of my head then. Double-checking the results of Pillow's reading sounds like a good idea - inspect the values (pixel values in this case) in different parts of your pipeline, and see where the differences start.
@StephenChan @kriegman @qiminchen : I tracked down the problem: it occurs right after loading the image. The good news is that it is easy to reproduce by running the code below. The bad news is that inconsistency seems to depend on the operating system.
from PIL import Image
im_pil = Image.open('i1023182.jpg')
checksum = sum(list([sum(c) for c in im_pil.getdata()]))
print(checksum)
Gives different checksum for different operating systems. So for I have tried:
python3.5, Pillow==6.2.0, ubuntu16.04: 6252762318 # This is the new spacer Docker build
python3.5, Pillow==7.1.1, ubuntu16.04: 6252762318 # This is the new spacer Docker build
python2.7, Pillow==3.3.1, ubuntu14.04: 6252755969 # This is the legacy spacer Docker build. (the one we use in production on CoralNet)
python2.7, Pillow==6.2.0, OSX 10.15.3: 6252762318 # My laptop in a virtualenv
python2.7, Pillow==3.3.1, OSX 10.15.3: 6252762318 # My laptop in a virtualenv
python3.5, Pillow==6.2.0, OSX 10.15.3: 6252762318 # My laptop in a virtualenv
python3.5, Pillow==3.3.1, OSX 10.15.3: 6252762318 # My laptop in a virtualenv
python3.7.3, Pillow==6.0.0, OSX 10.15.3: 6252762318 # My laptop in a virtualenv
In other words: all environments give the same results except the legacy build which is on Ubuntu 14.04. Note that I don't see this discrepancy for all images, but for some (most) I do. My best guess is has to do with how color-spaces are defined, or the jpg decoding implementation. This posts suggests that it can differ between systems: https://stackoverflow.com/questions/23565889/jpeg-images-have-different-pixel-values-across-multiple-devices.
I'll keep digging but wanted to share the results so far. Any clue as to what might be going on? I'd also appreciate any data point you can provide from your systems. Test image is attached.
Python 2.7 32-bit, Pillow==2.8.0, Windows: 6252762318 Python 2.7 32-bit, Pillow==4.3.0, Windows: 6252762318 Python 3.4 32-bit, Pillow==4.0.0, Windows: 6252762318 Python 3.6 64-bit, Pillow==4.3.0, Windows: 6252762318 Python 3.7 64-bit, Pillow==6.0.0, Windows: 6252755969
All on my laptop. Let me know if any further tests on that 3.7 environment would help.
Python 2.7 32-bit, Pillow==2.8.0, Windows: 6252762318 Python 2.7 32-bit, Pillow==4.3.0, Windows: 6252762318 Python 3.6 64-bit, Pillow==4.3.0, Windows: 6252762318 Python 3.7 64-bit, Pillow==6.0.0, Windows: 6252755969
All on my laptop. Let me know if any further tests on that 3.7 environment would help.
huh? So you got the "odd" results (6252755969
) using the latest python and Pillow versions? Hmm. I just added that same combo on my laptop. I'm still getting 6252762318
.
Close to latest, but yeah. I checked im_pil.icclist
in both cases and it turned up as []
, for what it's worth. Also, all of those Python installs were from binary, not source, so I suppose they may be compiled with different libraries.
Just to toss in other places where this randomness occurs is in training and inference with deep networks. The results can differ with CPU vs. GPU and different GPU's.
Completely reproducible results are not guaranteed across PyTorch releases, individual commits or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.
However, in order to make computations deterministic on your specific problem on one specific platform and PyTorch release, there are a couple of steps to take.
See https://pytorch.org/docs/stable/notes/randomness.html
On Fri, Apr 10, 2020 at 1:10 PM StephenChan notifications@github.com wrote:
Close to latest, but yeah. I checked im_pil.icclist in both cases and it turned up as [], for what it's worth. Also, all of those Python installs were from binary, not source, so I suppose they may be compiled with different libraries.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/beijbom/pyspacer/pull/10#issuecomment-612196836, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKA5AI5JZ7RASQAPAV4OO3RL54LFANCNFSM4LDUBCHA .
@StephenChan : how are you installing these? With pip
? In conda
or virtualenv
? Can you please run these also?
Python 3.6 64-bit, Pillow==6.0.0, Windows:
Python 3.7 64-bit, Pillow==4.3.0, Windows:
All with pip, and no conda. Updated list of my tests, specifying global (no virtualenv) install vs. virtualenv install:
Python 2.7 32-bit
Python 3.4 32-bit
Python 3.6 64-bit
Python 3.7 64-bit
In each of the new tests, I made sure to start from a fresh virtualenv. It seems like in my case, it's not about the Python version, and all about the Pillow version, going from 5.2.0 to 5.3.0.
Thanks @StephenChan . I'm pretty sure I know what the problem is now: the libjpeg
library version used by PILLOW. The new docker build and my laptop uses libjpeg9
, while the old Docker build uses libjpeg8
. This person ran into the same problem it seems:
https://stackoverflow.com/questions/48048904/how-to-switch-libjpeg9-used-in-pillow3-1-1-to-libjpeg8
What I don't get is how you get different results with different versions of python and PILLOW. Can you check what version of libjpeg
each is using? This thread has instructions for how to check for linux and OSX. Not sure how to do it for Windows tho:
https://stackoverflow.com/questions/24396727/find-which-libjpeg-is-being-used-by-pil-pillow
Another question is this: I'm not sure how to change which version of libjpeg
is used. And I'm not sure it is worth it TBH. Sure, maybe we can patch it now, but then we are stuck with the same problem next time we upgrade since at some point we need to upgrade to the newer libjpeg
.
I'm running an extensive regression test now to assess how different the scores really are. If the difference is minor I'm inclined to say we bite the bullet and accept the slightly different output.
Macbook laptop:
otool -L /Users/beijbom/.virtualenvs/spacer3.5/lib/python3.5/site-packages/PIL/_imaging.cpython-35m-darwin.so
/Users/beijbom/.virtualenvs/spacer3.5/lib/python3.5/site-packages/PIL/_imaging.cpython-35m-darwin.so:
@loader_path/.dylibs/libjpeg.9.dylib (compatibility version 13.0.0, current version 13.0.0)
@loader_path/.dylibs/libopenjp2.2.3.1.dylib (compatibility version 7.0.0, current version 2.3.1)
@loader_path/.dylibs/libz.1.2.11.dylib (compatibility version 1.0.0, current version 1.2.11)
@loader_path/.dylibs/libtiff.5.dylib (compatibility version 10.0.0, current version 10.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
New docker:
ldd /root/anaconda3/envs/oscar_test3/lib/python3.5/site-packages/PIL/_imaging.cpython-35m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffce713b000)
libjpeg-bcb94a84.so.9.2.0 => /root/anaconda3/envs/oscar_test3/lib/python3.5/site-packages/PIL/.libs/libjpeg-bcb94a84.so.9.2.0 (0x00007f811e50f000)
libopenjp2-59185378.so.2.1.0 => /root/anaconda3/envs/oscar_test3/lib/python3.5/site-packages/PIL/.libs/libopenjp2-59185378.so.2.1.0 (0x00007f811e2c4000)
libz-a147dcb0.so.1.2.3 => /root/anaconda3/envs/oscar_test3/lib/python3.5/site-packages/PIL/.libs/libz-a147dcb0.so.1.2.3 (0x00007f811e0af000)
libtiff-a8dfd9d2.so.5.2.4 => /root/anaconda3/envs/oscar_test3/lib/python3.5/site-packages/PIL/.libs/libtiff-a8dfd9d2.so.5.2.4 (0x00007f811de34000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f811dc17000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f811d84d000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f811d544000)
liblzma-f444c404.so.5.2.2 => /root/anaconda3/envs/oscar_test3/lib/python3.5/site-packages/PIL/.libs/./liblzma-f444c404.so.5.2.2 (0x00007f811d31e000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f811d116000)
/lib64/ld-linux-x86-64.so.2 (0x00007f811e9cc000)
Old docker:
/usr/local/lib/python2.7/dist-packages/PIL# ldd _imaging.so
linux-vdso.so.1 => (0x00007fff09f56000)
libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007f36a2550000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f36a2337000)
libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007f36a20c5000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f36a1ea7000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f36a1ae2000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f36a18c0000)
libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007f36a16b2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f36a13ac000)
/lib64/ld-linux-x86-64.so.2 (0x00007f36a29f1000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f36a11a8000)
What I don't get is how you get different results with different versions of python and PILLOW. Can you check what version of
libjpeg
each is using? This thread has instructions for how to check for linux and OSX. Not sure how to do it for Windows tho: https://stackoverflow.com/questions/24396727/find-which-libjpeg-is-being-used-by-pil-pillow
Building software with Linux-based libraries is pretty unreliable on Windows, so I guess the Windows binaries for Pillow just bundle in libjpeg, zlib, and whatever else is needed. (The build failures I got for certain Python version + Pillow version combos were about failing to find zlib. I think all of my install successes were just from binary.)
I'm not seeing a way to check the Pillow libjpeg version on Windows, unfortunately. The closest thing I could think of was this snippet, but it gets the same result on Pillow 5.2.0 and 5.3.0:
>>> from PIL import features
>>> sorted(features.get_supported())
['freetype2', 'jpg', 'jpg_2000', 'libtiff', 'littlecms2', 'pil', 'tkinter', 'transp_webp', 'webp', 'webp_anim', 'webp_mux', 'zlib']
Also, that's odd if you get 6252755969 from older libjpeg, while I get it from newer Pillow. I guess that result isn't necessarily from an obsolete implementation of jpg decoding. Or maybe it is; hard to say.
@qiminchen : can you run this script also, and check which version of libjpeg
is installed on your system? Details above --^
@qiminchen : can you run this script also, and check which version of
libjpeg
is installed on your system? Details above --^
sure, I will roll out the detail on my system which is Ubuntu 18.04.4 tonight as soon as possible
hey @beijbom All with conda on Ubuntu 18.04.4. I think you are right, the problem should be on the libjpeg
library version used by PILLOW
since I got the same results with different versions of python
and PILLOW
and same version of libjpeg
python 2.7, Pillow==3.3.1, libjpeg9: 6252762318
python 2.7, Pillow==4.3.0, libjpeg9: 6252762318
python 2.7, Pillow==6.2.0, libjpeg9: 6252762318
python 3.5, Pillow==3.3.1, libjpeg9: 6252762318
python 3.5, Pillow==5.3.0, libjpeg9: 6252762318
python 3.6.10, Pillow==4.3.0, libjpeg9: 6252762318
python 3.6.10, Pillow==5.0.0, libjpeg9: 6252762318
python 3.6.10, Pillow==5.2.0, libjpeg9: 6252762318
python 3.6.10, Pillow==6.0.0, libjpeg9: 6252762318
python 3.6.10, Pillow==6.2.0, libjpeg9: 6252762318
python 3.7.3, Pillow==5.2.0, libjpeg9: 6252762318
python 3.7.3, Pillow==6.0.0, libjpeg9: 6252762318
python 3.7.3, Pillow==6.2.0, libjpeg9: 6252762318
python 3.8.2, Pillow==7.1.1, libjpeg9: 6252762318
More details about libjpeg
:
ldd /home/qimin/anaconda3/envs/test/lib/python2.7/site-packages/PIL/_imaging.so
linux-vdso.so.1 (0x00007ffc00020000)
libjpeg.so.9 => /home/qimin/anaconda3/envs/test/lib/python2.7/site-packages/PIL/../../../libjpeg.so.9 (0x00007efbff6be000)
libz.so.1 => /home/qimin/anaconda3/envs/test/lib/python2.7/site-packages/PIL/../../../libz.so.1 (0x00007efbff6a4000)
libtiff.so.5 => /home/qimin/anaconda3/envs/test/lib/python2.7/site-packages/PIL/../../../libtiff.so.5 (0x00007efbff628000)
libpython2.7.so.1.0 => /home/qimin/anaconda3/envs/test/lib/python2.7/site-packages/PIL/../../../libpython2.7.so.1.0 (0x00007efbff081000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efbfee62000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efbfea71000)
liblzma.so.5 => /home/qimin/anaconda3/envs/test/lib/python2.7/site-packages/PIL/../../.././liblzma.so.5 (0x00007efbff5e2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efbfe6d3000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007efbfe4cf000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007efbfe2cc000)
/lib64/ld-linux-x86-64.so.2 (0x00007efbff4d7000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efbfe0c4000)
ldd /home/qimin/anaconda3/envs/test/lib/python3.6/site-packages/PIL/_imaging.cpython-36m-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffc00020000)
libjpeg.so.9 => /home/qimin/anaconda3/envs/test/lib/python3.6/site-packages/PIL/../../../libjpeg.so.9 (0x00007efbff648000)
libz.so.1 => /home/qimin/anaconda3/envs/test/lib/python3.6/site-packages/PIL/../../../libz.so.1 (0x00007efbff62e000)
libtiff.so.5 => /home/qimin/anaconda3/envs/test/lib/python3.6/site-packages/PIL/../../../libtiff.so.5 (0x00007efbff5ac000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efbff2b8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efbfeec7000)
libwebp.so.7 => /home/qimin/anaconda3/envs/test/lib/python3.6/site-packages/PIL/../../.././libwebp.so.7 (0x00007efbff505000)
libzstd.so.1 => /home/qimin/anaconda3/envs/test/lib/python3.6/site-packages/PIL/../../.././libzstd.so.1 (0x00007efbfee0b000)
liblzma.so.5 => /home/qimin/anaconda3/envs/test/lib/python3.6/site-packages/PIL/../../.././liblzma.so.5 (0x00007efbfede2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efbfea44000)
/lib64/ld-linux-x86-64.so.2 (0x00007efbff4d7000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efbfe83c000)
ldd /home/qimin/anaconda3/envs/test/lib/python3.7/site-packages/PIL/_imaging.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffc00020000)
libjpeg.so.9 => /home/qimin/anaconda3/envs/test/lib/python3.7/site-packages/PIL/../../../libjpeg.so.9 (0x00007efbff647000)
libz.so.1 => /home/qimin/anaconda3/envs/test/lib/python3.7/site-packages/PIL/../../../libz.so.1 (0x00007efbff62d000)
libtiff.so.5 => /home/qimin/anaconda3/envs/test/lib/python3.7/site-packages/PIL/../../../libtiff.so.5 (0x00007efbff5ab000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efbff2b8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efbfeec7000)
libwebp.so.7 => /home/qimin/anaconda3/envs/test/lib/python3.7/site-packages/PIL/../../.././libwebp.so.7 (0x00007efbff504000)
libzstd.so.1 => /home/qimin/anaconda3/envs/test/lib/python3.7/site-packages/PIL/../../.././libzstd.so.1 (0x00007efbfee0b000)
liblzma.so.5 => /home/qimin/anaconda3/envs/test/lib/python3.7/site-packages/PIL/../../.././liblzma.so.5 (0x00007efbfede2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efbfea44000)
/lib64/ld-linux-x86-64.so.2 (0x00007efbff4d7000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efbfe83c000)
ldd /home/qimin/anaconda3/envs/test/lib/python3.8/site-packages/PIL/_imaging.cpython-38-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffc00020000)
libjpeg.so.9 => /home/qimin/anaconda3/envs/test/lib/python3.8/site-packages/PIL/../../../libjpeg.so.9 (0x00007efbff640000)
libz.so.1 => /home/qimin/anaconda3/envs/test/lib/python3.8/site-packages/PIL/../../../libz.so.1 (0x00007efbff626000)
libtiff.so.5 => /home/qimin/anaconda3/envs/test/lib/python3.8/site-packages/PIL/../../../libtiff.so.5 (0x00007efbff5a4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efbff2b8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efbfeec7000)
libwebp.so.7 => /home/qimin/anaconda3/envs/test/lib/python3.8/site-packages/PIL/../../.././libwebp.so.7 (0x00007efbfee3d000)
libzstd.so.1 => /home/qimin/anaconda3/envs/test/lib/python3.8/site-packages/PIL/../../.././libzstd.so.1 (0x00007efbfed81000)
liblzma.so.5 => /home/qimin/anaconda3/envs/test/lib/python3.8/site-packages/PIL/../../.././liblzma.so.5 (0x00007efbff55e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efbfe9e3000)
/lib64/ld-linux-x86-64.so.2 (0x00007efbff4d7000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efbfe7db000)
@StephenChan @qiminchen : I was able to fully confirm this by building PIL
inside the new docker using libjpeg8
. This then gave me the 6252755969
checksum from the old, production docker build. So at this point, we fully know where the diff came from and can choose what to do.
Staying with libjpeg8
is nice since we get pixel-perfect backwards compatibility. However, at some point libjpeg8
will be deprecated, so we will have to make the leap then. Doing it now makes some sense since most folks will move to the new feature-extractor anyways. I'd also assume libjpeg9
is better, faster etc since it is newer. Finally, I did conclude the regression run and across the 8 images, with 1000s of points, the max difference in score was less than 3% for any label. So this is unlikely to cause any problems for current sources. Still, it's annoying. What are your thoughts?
I guess another option is: for the tests where you really want to be pixel-perfect, you accept either the libjpeg8 or libjpeg9 values. This would obviously make developing the tests more time consuming, but it could be worth considering if you only write a few tests like this.
I guess another option is: for the tests where you really want to be pixel-perfect, you accept either the libjpeg8 or libjpeg9 values. This would obviously make developing the tests more time consuming, but it could be worth considering if you only write a few tests like this.
Yeah. I suppose so. The problem is to do it properly, I'd have to install libjpeg9
in the old docker build to make sure that is the only thing that changed. But that one runs ubuntu14.04 so it's not possible. The alternative is to load the images on my laptop using libjpeg9
, store them as arrays on disk and then load from there in the old docker. But that is getting to be pretty cumbersome. I might come back to that, but for now, let's assume we settle for what is in this PR.
Speaking of which, it is finally ready to review. More details is in the main PR message. CC @qiminchen .
Cool. I can look through the code tomorrow.
One other thought: What about converting from JPG to PNG as a preprocessing step, and then start your testing / cross-validation / etc. with PNG as inputs? Obviously you get a filesize increase, but I don't know if it would be prohibitive.
@StephenChan : that is a good idea. I added a unit-test for just that.
@StephenChan @qiminchen : I made a new release if you want to try pip installing
at https://pypi.org/project/pyspacer/0.1.0/. I also just pushed some minor cleanups; nothing major.
My testing:
Docker: All tests passed. Very nice job on the code coverage!
git clone on Windows: 2 test errors, both ModuleNotFoundError: No module named 'copy_reg\r'
when trying to unpickle. I'm reading that this could be a Windows vs. Unix thing, though, and it could be fixed by converting newline chars on the pickled file before loading it (https://stackoverflow.com/questions/556269/importerror-no-module-named-copy-reg-pickle). I'd say it's not a big deal for now. If it's bugging me later, I can make a PR to add a Windows case to those tests.
pip install on Windows: Was able to install 0.1.0 in the same virtualenv that I used to run coralnet tests in Python 3.6. The only thing is that I forgot that coralnet's boto requirement was still 2.40.0, while pyspacer currently says 2.49.0. I'll look over the boto release notes to see if that could be an issue or not. I'm guessing it won't be too bad though.
Spotted a couple of minor mistakes:
README.md
: In the Docker build section, the bash example still has test:Dockerfile
instead of spacer:test
.
storage.py
: The URLStorage.delete()
error message says 'Store operation' instead of 'Delete operation'.
I skimmed over the rest of the code. To be honest, I'm not following it all super closely, but the DataClass and DataLocation SWE stuff is interesting. Seems like it cuts down on boilerplate a fair bit, and indeed the tasks, extract_features, and train_classifier files look rather tidy. Maybe I'll look over coralnet again for similar DRY opportunities when I get the chance. Anyway, PR looks good to me.
OK (skipped=2) Name Stmts Miss Cover Missing ---------------------------------------------------------- spacer/__init__.py 1 0 100% spacer/caffe_utils.py 72 0 100% spacer/config.py 54 0 100% spacer/data_classes.py 81 0 100% spacer/extract_features.py 47 4 91% 115, 119, 134-135 spacer/mailman.py 15 0 100% spacer/messages.py 163 0 100% spacer/storage.py 127 0 100% spacer/task_utils.py 11 0 100% spacer/tasks.py 37 0 100% spacer/train_classifier.py 23 0 100% spacer/train_utils.py 108 0 100% ---------------------------------------------------------- TOTAL 739 4 99%
Installing collected packages: boto, wget, coverage, tqdm, six, termcolor, fire, Pillow, numpy, scipy, joblib, scikit-learn, pyparsing, python-dateutil, kiwisolver, cycler, matplotlib, decorator, networkx, imageio, PyWavelets, scikit-image, torch, torchvision Successfully installed Pillow-6.2.2 PyWavelets-1.1.1 boto-2.49.0 coverage-5.0.3 cycler-0.10.0 decorator-4.4.2 fire-0.2.1 imageio-2.8.0 joblib-0.14.1 kiwisolver-1.2.0 matplotlib-3.2.1 networkx-2.4 numpy-1.18.1 pyparsing-2.4.7 python-dateutil-2.8.1 scikit-image-0.15.0 scikit-learn-0.22.1 scipy-1.4.1 six-1.14.0 termcolor-1.1.0 torch-1.4.0 torchvision-0.5.0 tqdm-4.43.0 wget-3.2
Thanks for testing @StephenChan and @qiminchen .
My testing:
- Docker: All tests passed. Very nice job on the code coverage!
Thanks. Although, 100% code coverage just means every line is touched at least once. I'm pretty sure there are corner cases still in there that I missed. I also used the # pragma: no cover
pretty generously.
- git clone on Windows: 2 test errors, both
ModuleNotFoundError: No module named 'copy_reg\r'
when trying to unpickle. I'm reading that this could be a Windows vs. Unix thing, though, and it could be fixed by converting newline chars on the pickled file before loading it (https://stackoverflow.com/questions/556269/importerror-no-module-named-copy-reg-pickle). I'd say it's not a big deal for now. If it's bugging me later, I can make a PR to add a Windows case to those tests.
Sounds good.
- pip install on Windows: Was able to install 0.1.0 in the same virtualenv that I used to run coralnet tests in Python 3.6. The only thing is that I forgot that coralnet's boto requirement was still 2.40.0, while pyspacer currently says 2.49.0. I'll look over the boto release notes to see if that could be an issue or not. I'm guessing it won't be too bad though.
I tried using 2.40 on the spacer side and it worked fine. I have relaxed install_requires
.
Spotted a couple of minor mistakes:
README.md
: In the Docker build section, the bash example still hastest:Dockerfile
instead ofspacer:test
.storage.py
: TheURLStorage.delete()
error message says 'Store operation' instead of 'Delete operation'.
Thanks. Fixed.
I skimmed over the rest of the code. To be honest, I'm not following it all super closely, but the DataClass and DataLocation SWE stuff is interesting. Seems like it cuts down on boilerplate a fair bit, and indeed the tasks, extract_features, and train_classifier files look rather tidy. Maybe I'll look over coralnet again for similar DRY opportunities when I get the chance. Anyway, PR looks good to me.
Thanks @StephenChan . Yeah, I'm really happy with the DataLocation
design. Worked well throughout. And I do use data-classes a fair amount overall. For Coralnet we already have those in models.py
complete with load and store operations. But here I created my own json
compatible store *& load.
Merging this now.
@StephenChan : new pypi package 0.1.1
created with relaxed boto reqs.
Cool, thanks.
This PR ended up much larger than I anticipated. Sorry about that.
The changes can be grouped into: Dev-ops
SWE
DataLocation
struct to encapsulate where blobs are stored.storage.load
operations.load_net
for caffe extractor.API
classify_features
task, and renamesdeploy
toclassify_image
.DataLocation
struct. Other misc cleanup.Misc
To test follow instructions in
README.md
.