Lattice-Automation / seqfold

nucleic acid folding
MIT License
79 stars 12 forks source link

Numba compatibility? #8

Closed laserson closed 2 years ago

laserson commented 3 years ago

I tried to Numba jit the dg function as a longshot and it appears that it failed. Have y'all tried Numba compiling any parts of this to increase the performance?

jjti commented 3 years ago

I haven't tried Numba (yet). I tried messing around with cython (and failed iirc) but I'll check out Numba, would be interesting. I'd be all for trying to improve its performance, I remember it starting to chug along when I got up to the hundreds of bp

laserson commented 3 years ago

I love how your code makes heavy use of the latest in typing annotations, but strangely, I think this may be something that the Numba compiler is choking on. (Since that stuff is not yet implemented there.) I would guess that if things moved to be more "transparent" objects for the parameters, it would probably compile very readily, and have huge speedup.

There are a few other things that probably need to get nixed for Numba, like replacing generators with list comprehensions.

jjti commented 2 years ago

I tried with Numba and felt like I got further to something working that I had with Cython, but no dice.

I did however try out pypy3 and noticed that it's significantly faster for larger sequences. ~2sec vs ~15 for a 200bp seq. I added some notes on such to the README: https://github.com/Lattice-Automation/seqfold

If you're not already, that's probably the way to go for a quick performance gain

https://realpython.com/pypy-faster-python/

jjti commented 2 years ago

I'm still interested in Numba/Cython for what it's worth, but given how both handle lists, I'd have to re-write much of the code (probably w/ Numpy). I know now that I should've gone with Numpy from the start, but at this point it feels too large a re-write