czbiohub-sf / xicor

xi correlation method adapted for python
MIT License
145 stars 17 forks source link

[MRG] use scipy to do the rank math and add profiling notebook #4

Closed pranathivemuri closed 4 years ago

pranathivemuri commented 4 years ago
  1. added travis.yml for testing
  2. linted the code a little
  3. more importantly refactored the pandas and changed it to scipy rank method. scipy directly calls numpy whereas pandas does the same but it has a lot of index series etc math that it does before actually the numpy for calculation of rank
  4. We still get the same answers as before which is pretty cool and proves that it still works the same way as before
  5. edited environment.yml and requirements.txt

If you google comparison between numpy and pandas you might find this, this article says in general numpy is better and performance depends on number of rows. For single row vectors like our data scipy seems better so far with the profiling stats in the notebook tested on a test dataset with 2 cells for human with 950 genes and the test dataset from conftest - http://gouthamanbalaraman.com/blog/numpy-vs-pandas-comparison.html

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

pranathivemuri commented 4 years ago

image image