elegant-scipy / elegant-scipy

1st Edition of Elegant SciPy (O'Reilly Publishers)
Other
553 stars 208 forks source link

counts_rpkm in chapter 1 doesn't work with 64-bit NumPy builds, too #325

Open capissimo opened 6 years ago

capissimo commented 6 years ago

Pictures 1-13 and 1-15 cannot be reproduced due to this:

c:\python36\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in log """Entry point for launching an IPython kernel. c:\python36\lib\site-packages\numpy\lib\function_base.py:4274: RuntimeWarning: Invalid value encountered in percentile interpolation=interpolation) c:\python36\lib\site-packages\matplotlib\cbook__init.py:1856: RuntimeWarning: invalid value encountered in less_equal wiskhi = np.compress(x <= hival, x) c:\python36\lib\site-packages\matplotlib\cbook__init__.py:1863: RuntimeWarning: invalid value encountered in greater_equal wisklo = np.compress(x >= loval, x) c:\python36\lib\site-packages\matplotlib\cbook\init.py:1871: RuntimeWarning: invalid value encountered in less np.compress(x < stats['whislo'], x), c:\python36\lib\site-packages\matplotlib\cbook\init__.py:1872: RuntimeWarning: invalid value encountered in greater np.compress(x > stats['whishi'], x)

I have Win10 64/Python 3.6.4/NumPy 1.14.2 mkl

jni commented 6 years ago

@capissimo interesting! Thanks for the report, we'll check it out and report back. Maybe NumPy 1.14 changed something...

capissimo commented 6 years ago

With the previous version (1.13.1) the outcome was the same. I'm running the examples in a Jupyter notebook. Maybe that's the reason...

jni commented 6 years ago

No, we build the whole book with the notebook, so I doubt that's the reason. Remind me, do we set the random seed and did you set yours?

capissimo commented 6 years ago

Yes, np.random.seed(seed=7). It was in the subsection 'Normalizing library size between samples', just before building the plot in Figure 1-7.

jni commented 6 years ago

Hmmm. So the figure works for me but the genes are not the same as in the book! Grrr. Looks like we need to figure out what's changing between runs or between systems. Working on it! Thank you again!

jni commented 6 years ago

Hmmm, pd.Index.intersection, I wonder if that's changed between pandas versions. Highly suspect.

capissimo commented 6 years ago

Just to be sure, updated pandas. No go.

jni commented 6 years ago

Gah: https://pandas.pydata.org/pandas-docs/version/0.20/whatsnew.html#index-intersection-and-inner-join-now-preserve-the-order-of-the-left-index

jni commented 6 years ago

@capissimo that explains my error at least. We'll have to update the book for it. Yours seems different. Would you like to chat at https://gitter.im/elegant-scipy/bugs? I have 15 mins now, or we could make some other time.

capissimo commented 6 years ago

But how does this relate to log_counts later for fig 1-13? Oddly enough, after updating pandas plots started to look differently(

capissimo commented 6 years ago

Checked count_rpkm fun for np.sum(np.isnan(array)) not being less 0

print(np.sum(np.isnan(L))) eq 0 print(np.sum(np.isnan(C))) eq 0 print(np.sum(np.isnan(normed))) eq 0

jni commented 6 years ago

As mentioned on gitter: what about np.sum(count_rpkm < 0)? Or < (-1)?

capissimo commented 6 years ago

2018-03-28 12:55 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

As mentioned on gitter: what about np.sum(count_rpkm < 0)? Or < (-1)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376744046, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3eqa3wnou5f1Eta4HjSkyKpppJtmks5tivumgaJpZM4S6gcq .

capissimo commented 6 years ago

I collected the 1st chapter's material in a Jupyter notebook for you to see what's going on on my side.

2018-03-28 13:00 GMT+10:00 Андрей Логунов laborhm@gmail.com:

2018-03-28 12:55 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

As mentioned on gitter: what about np.sum(count_rpkm < 0)? Or < (-1)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376744046, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3eqa3wnou5f1Eta4HjSkyKpppJtmks5tivumgaJpZM4S6gcq .

jni commented 6 years ago

@capissimo that's where your problem is. I think you must have made a mistake in creating counts_rpkm somewhere... Can you put your notebook on gist.github.com?

screen shot 2018-03-28 at 10 24 48 am
capissimo commented 6 years ago

Yes, I did as you told.

2018-03-29 0:27 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

@capissimo https://github.com/capissimo that's where your problem is. I think you must have made a mistake in creating counts_rpkm somewhere... Can you put your notebook on gist.github.com?

[image: screen shot 2018-03-28 at 10 24 48 am] https://user-images.githubusercontent.com/492549/38035468-7dadb914-3272-11e8-9759-e7acfa3858b2.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376906731, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3W4ZT8-jYhOEvXSpebTz4QQdVlx1ks5ti53lgaJpZM4S6gcq .

capissimo commented 6 years ago

Ple notice, how the plots look different after I upgraded pandas((

2018-03-29 0:37 GMT+10:00 Андрей Логунов laborhm@gmail.com:

Yes, I did as you told.

2018-03-29 0:27 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

@capissimo https://github.com/capissimo that's where your problem is. I think you must have made a mistake in creating counts_rpkm somewhere... Can you put your notebook on gist.github.com?

[image: screen shot 2018-03-28 at 10 24 48 am] https://user-images.githubusercontent.com/492549/38035468-7dadb914-3272-11e8-9759-e7acfa3858b2.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376906731, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3W4ZT8-jYhOEvXSpebTz4QQdVlx1ks5ti53lgaJpZM4S6gcq .

jni commented 6 years ago

@capissimo I don't know where your notebook is? I don't see a link

capissimo commented 6 years ago

Pardon, here it is https://gist.github.com/capissimo/16306b04fd4039d48ba6df67857fbf88

2018-03-29 0:48 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

@capissimo https://github.com/capissimo I don't know where your notebook is? I don't see a link

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376914189, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3T5ZT65F5OaYVaLZ623dSz1ffVvlks5ti6K2gaJpZM4S6gcq .

jni commented 6 years ago

@capissimo you must have run and re-run certain "play" cells and gotten the notebook into a bad state. I've run the notebook from scratch and things work! So try "Restart and run all" in the notebook and I hope that will fix your problem?

btw I can't tell you how happy it makes me to see it all in Russian. =D

jni commented 6 years ago

Also btw if there are any parts that you thought were unclear or "ugly", please let us know!

capissimo commented 6 years ago

Hi, Juan,

Found a couple of places, where to update.

p. 79, 153: Newer versions of Networkx do not have edges_iter() g.edges_iter() --> g.edges() (OR: g.edges)

p. 152: A typo. Using a tuple causes error, so change to list repr, () --> [] def threshold_graph(g, t): to_remove = ((u, v) for (u, v, d) in g.edges(data=True) if d['weight'] > t) g.remove_edges_from(to_remove)

def threshold_graph(g, t): to_remove = [(u, v) for (u, v, d) in g.edges(data=True) if d['weight'] > t] g.remove_edges_from(to_remove)

2018-03-29 1:13 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

Also btw if there are any parts that you thought were unclear or "ugly", please let us know!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376923178, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3W5MkpaDAkb7ssOtTGq1KekMOO-Lks5ti6iFgaJpZM4S6gcq .

capissimo commented 6 years ago

I still can't get how I should use intersection to get things straight.

2018-03-29 9:22 GMT+10:00 Андрей Логунов laborhm@gmail.com:

Hi, Juan,

Found a couple of places, where to update.

p. 79, 153: Newer versions of Networkx do not have edges_iter() g.edges_iter() --> g.edges() (OR: g.edges)

p. 152: A typo. Using a tuple causes error, so change to list repr, () --> [] def threshold_graph(g, t): to_remove = ((u, v) for (u, v, d) in g.edges(data=True) if d['weight'] > t) g.remove_edges_from(to_remove)

def threshold_graph(g, t): to_remove = [(u, v) for (u, v, d) in g.edges(data=True) if d['weight'] > t] g.remove_edges_from(to_remove)

2018-03-29 1:13 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

Also btw if there are any parts that you thought were unclear or "ugly", please let us know!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376923178, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3W5MkpaDAkb7ssOtTGq1KekMOO-Lks5ti6iFgaJpZM4S6gcq .

capissimo commented 6 years ago

In general I found the book very interesting. What it makes more valuable is the absence of practical stuff on SciPy out there. Some may argue that a vague genetic analysis procedure and FFT concepts sort of downgrades it. My point is that this is science. Some will be motivated to go and study up on RNA-seq, FFT, etc., but the majority, I'm sure, will take home the underlying priciple and use it in their own field. Personally, I like most the last part on combination of scientific analysis with functional programming. And one last thing. In all there two main ways how to write science-oriented computer books. You either take a scientific problem and show how to solve it, or you take a library function and show how to use it in practice. SciPy now has a remarkable book written according to theformer, but still lacks the latter. Hope, we shall see a continuation))

2018-03-29 9:24 GMT+10:00 Андрей Логунов laborhm@gmail.com:

I still can't get how I should use intersection to get things straight.

2018-03-29 9:22 GMT+10:00 Андрей Логунов laborhm@gmail.com:

Hi, Juan,

Found a couple of places, where to update.

p. 79, 153: Newer versions of Networkx do not have edges_iter() g.edges_iter() --> g.edges() (OR: g.edges)

p. 152: A typo. Using a tuple causes error, so change to list repr, () --> [] def threshold_graph(g, t): to_remove = ((u, v) for (u, v, d) in g.edges(data=True) if d['weight'] > t) g.remove_edges_from(to_remove)

def threshold_graph(g, t): to_remove = [(u, v) for (u, v, d) in g.edges(data=True) if d['weight'] > t] g.remove_edges_from(to_remove)

2018-03-29 1:13 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

Also btw if there are any parts that you thought were unclear or "ugly", please let us know!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-376923178, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3W5MkpaDAkb7ssOtTGq1KekMOO-Lks5ti6iFgaJpZM4S6gcq .

jni commented 6 years ago

Hi again @capissimo! And thank you for your comments! I'm super happy that you liked Ch8, that's kind of my baby, and it was a risk to include it since it's not strictly SciPy (the library).

Regarding the NetworkX bugs, thanks for the report! Actually we already fixed them, but it was just as the book was going to print. =\

https://github.com/elegant-scipy/elegant-scipy/pull/306

Weirdly, it looks like some of the fixes in that PR made it into the printed book, but not all of them! I'll add these to the errata list.

Finally, about the kinds of book: in my opinion, the best book for the latter form will be an index mapping functions to pages in our book where they are used. ;) I find that, for the majority of functions in SciPy, it's really hard to come up with a real example of a use in practice, that is not embedded in a much larger analysis. They are foundational functions that are useful to build bigger programs.

We do hope to continue the book, specifically with additional chapters using parts of the library that we haven't covered (such as scipy.spatial).

capissimo commented 6 years ago

SOLVED! This has done the trick: N = np.sum(counts, axis=0).astype(float) in rpkm

jni commented 6 years ago

👏👏👏👏 🎉🎉🎉🎉

By the way, could you check whether the version in the PR works for you also?

C = counts.astype(float)
N = np.sum(C, axis=0)

?

capissimo commented 6 years ago

Yes, it does work all right.

2018-04-01 11:45 GMT+10:00 Juan Nunez-Iglesias notifications@github.com:

👏👏👏👏 🎉🎉🎉🎉

By the way, could you check whether the version in the PR works for you also?

C = counts.astype(float) N = np.sum(C, axis=0)

?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elegant-scipy/elegant-scipy/issues/325#issuecomment-377736514, or mute the thread https://github.com/notifications/unsubscribe-auth/ABro3UAR9-nrMxlYNN3XV0reQERbSzbkks5tkDFIgaJpZM4S6gcq .