Closed syouligan closed 3 years ago
This is so strange -- it's most likely an annoy problem (it's living up to it's name). Can you try range searching annoy versions and see if any of them work?
Yeah Ill take a look. It does seem there is dependency for scanorama 1.6 that annoy => 1.11.5 correct?
@syouligan that's a dependency that worked for me on my Ubuntu Linux box. Unfortunately it seems like perhaps different versions of annoy might fail on different machine architectures...
@syouligan I hope this issue has been resolved, and please let me know if you were able to successfully run annoy and, if so, how. I'll close the issue since it doesn't appear to be an issue with Scanorama, but please feel free to reopen if it is.
Hey, I think we're seeing this happen occasionally over on the scanpy CI builds. It's happened a few times over the past couple days, but only on the python 3.6 builds. It has always worked when we restart the job.
Here's a link to the build logs for a failed build (though I think azure deletes these kind quickly)
And here is a gist of the full log if you want to check installation versions.
@ivirshup thanks for letting me know. This call stack error message is great -- usually a segfault/illegal instruction just terminates Python without a helpful error message. It looks like it's failing in a call to the annoy tree data structure: https://github.com/brianhie/scanorama/blob/763e581d1efa61f60a0b41a50b51c38cdd008269/scanorama/scanorama.py#L522
Scanorama is written in pure Python so any invalid memory accesses or problems at the architecture level are most likely a problem in an underlying piece of code, in this case annoy. I think it's been a problem in the past as well. If you're noticing a flapping test, then maybe annoy gets confused about the architecture configuration on different calls. It looks like annoy has problems on different architectures:
Because the error occurs in a library call, it may be hard for me to debug without first consulting the annoy code base, which I am not familiar with. What I can do is use a different library to do approximate nearest neighbors, but this will most likely change the output of Scanorama, which I am inclined to try to avoid for now.
Have you noticed problems with annoy in the past? If the flapping test is too much an issue, I'd also be fine with removing the Scanorama dependency within Scanpy as well and just keeping them separate repos.
What I can do is use a different library to do approximate nearest neighbors, but this will most likely change the output of Scanorama, which I am inclined to try to avoid for now.
This was a potential change I was gonna suggest, but it could change results. That said, IIRC, you're not using the specifics of annoy
much, just that it does approximate graph construction and search? There is a sklearn
transformer API for this (package with implementation, though I recall reading about this going into sklearn
itself) which could make it easier to be flexible about the backend for this if this dependency give's you trouble.
If the flapping test is too much an issue, I'd also be fine with removing the Scanorama dependency within Scanpy as well and just keeping them separate repos.
This just started a few days ago, and may go away soon. It's just annoying to have a PR build pass, then have the badge go red cause it fails on master 😞.
If it keeps happening, I'll try a few things (one is dropping python 3.6 support a la numpy). But that it's crashing the process is a bad problem, since we can't just xfail
the test.
Great! So on my end, maybe I can just test a new approx NN backend, then just say in the README or something that annoy can be used to reproduce previous results but we recommend the less annoying backend that is more robust to different computer architectures.
It would be nice to have Python 3.6 support but do keep me posted on if the tests keep failing or if they get better, or worse!
@ivirshup one quick fix to make the test pass, in case you need a very quick patch, is to just pass in approx=False
as a parameter to scanorama_integrate()
in the test: https://github.com/theislab/scanpy/blob/master/scanpy/tests/external/test_scanorama_integrate.py#L16. Scanorama will then do nearest neighbors search with sklearn, which is still pretty fast.
Thanks for the suggestion @brianhie! I've added that.
Hi Brian
Thanks for your work. I am running into a 'Illegal instruction' error when running scanorama on a >100000 cell dataset on a HPC. I downgraded annoy to 1.11.5 and have tried playing with both sketch and batch_size to no avail. Same command runs fine on my local mac using 5% subset of total dataset to be run on the HPC. Any info you might have would be really appreciated.
Thanks again
HPC info - CentOS release 6.10 (Final)
Python info - Python 3.6.7 annoy 1.11.5 scanorama 1.6