LMFDB / lmfdb

L-Functions and Modular Forms Database
Other
246 stars 200 forks source link

Upgrading to Sage 7.3 #1332

Closed JohnCremona closed 8 years ago

JohnCremona commented 8 years ago

Sage 7.2 will be released soon and I want to start testing to see what changes might be needed. This Issue can be used to make a list. So far:

jvoight commented 8 years ago

Sorry all--my guess is that it can be removed, but back in the day I was told to use it, so I did!

To make this quicker, note that only lmfdb/hilbert_modular_forms/hilbert_modular_form.py is actually part of the LMFDB code base (the rest are import scripts), and preparse isn't used in the file so we could just remove one line.

JohnCremona commented 8 years ago

OK let's do that. Preparsing is designed only for the Sage command line so that users can type a^2 instead of a**2 and should not be used in library code (in my opinion). One might think that (for example) it's a useful way to convert a polynomial as a strong from Magma output into Sage, but Sage itself provides conversion from Magma oblects to Sage objects, and can contruct polynomials from strings, e.g. sage: Qx = PolynomialRing(QQ,'x') sage: Qx('x^2+1') x^2 + 1 so I would huess that every use of preparse can be eliminated.

JohnCremona commented 8 years ago

PR #1396 (now fully merged) fixed the redundant import in hilbert_modular_forms.py.

I will record any deprecation warnings shown in the test scripts and also grep for "sage.rings.arith". Outcome: just 4 places, LCM in HeckeCharacter.py, primes in number_field.py, binomial in symL.py, lcm and gcd in hypergm/plot.pt. In all cases I could just switch the import to be from sage.all. I think there is a convention that one should not import from as low down the tree as possible precisely to avoid this sort of thing, and this seemed easier than trying to import from sage.arith and falling back to sage.rings.arith on ImportError (for backwards compatibility). Testing the edited code now.

JohnCremona commented 8 years ago

See PR #1399

JohnCremona commented 8 years ago

There are two tests in lmfdb/test_lfunction.py which fail with sage 7.2 (and pass with 7.1) when I run them on lmfdb.warwick.ac.uk using the current master: test_Lsym2EC and test_Lsym3EC though they run fine on lehner using 7.2. I am investigating how two identical Sage builds can act differently on the same code.

JohnCremona commented 8 years ago

Just for the record, as of commit f5b867a (master on 17 July) all tests pass with Sage-7.2 and also with 7.3.beta8, though the latter gives a warning (I think this is from an updated flask version rather than Sage itself): Test download function ... /home/jec/lmfdb/lmfdb/modular_forms/elliptic_modular_forms/backend/emf_download_utils.py:62: DeprecationWarning: In future flask releases etags will no longer be generated for file objects passed to the send_file function because this behavior was unreliable. Pass filenames instead if possible, otherwise attach an etag yourself based on another value as_attachment=True)

AndrewVSutherland commented 8 years ago

This is encouraging news! Hopefully the switch will be painless.

JohnCremona commented 8 years ago

Sage-7.3 has just been released (August 4) and I'll test that this also works.

JohnCremona commented 8 years ago

Sage 7.3 works fine (all tests pass expect one which fails for a non-sage reason).

Deprecation warnings seen (not all related to Sage as such):

(flask) /home/lmfdb/sage3/local/lib/python2.7/site-packages/flask_login.py:148: DeprecationWarning: Warning setup_app is deprecated. Please use init_app. DeprecationWarning)

(markdown) /home/lmfdb/sage3/local/lib/python2.7/site-packages/markdown/init.py:259: DeprecationWarning: Using short names for Markdown's builtin extensions i s deprecated. Use the full path to the extension with Python's dot notation (eg: "markdown.extensions.wikilinks" instead of "wikilinks"). The current b ehavior will raise an error in version 2.7. See the Release Notes for Python-Markdown version 2.6 for more info.

(flask) test_download (lmfdb.genus2_curves.test_genus2_curves.Genus2Test) ... /home/lmfdb/lmfdb/lmfdb/genus2_curves/main.py:481: DeprecationWarning: In future flask releases etags will no longer be generated for file objects passed to the send_file function because this behavior was unreliable. Pass filename s instead if possible, otherwise attach an etag yourself based on another value

(flask) /home/lmfdb/lmfdb/lmfdb/modular_forms/elliptic_modular_forms/backend/emf_download_utils.py:61: DeprecationWarning: In future flask releases etags will no longer be generated for file objects passed to the send_file function because this behavior was unreliable. Pass filenam es instead if possible, otherwise attach an etag yourself based on another value

That's it -- not sage related at all.

JohnCremona commented 8 years ago

I did the test with 7,2 also (both on lmfdb.warwick) and got none of the above deprecations, so I checked the versino numbers of Flask and Markdown. For Flask the only difference is in Flask-AutoIndex (version 0.6 i nthe 7.3 install), and there is no difference in Markdown.

As far as I can tell this means two things: (1) we would have no problems at all in upgrading Sage. (2) We should be more careful about the version numbers of the additional python modules we use, since at present no two users will have exactly the same, it will depend on when they installed the modules (expect for flask-login which is forced to version 0.2.11 in the requirements.txt file).

AndrewVSutherland commented 8 years ago

I just set add_etags=False in a bunch of places which should remove this flask deprecation warning; I have no idea why the default on send_file is add_etags=True, but certainly we should be setting it to false (it will only slow things down to set it to True).

edgarcosta commented 8 years ago

I'm now building sage-7.2 on one of the webservers.

JohnCremona commented 8 years ago

Thanks. On lmfdb.warwick we now have 7.1, 7.2 and 7.3 in ~/sage1, ~/sage2, ~/sage3 respectively (nice coincidence) though both sage-beta and sage-prod are linked to the first of these. For testing remember that you can seg the environment variable SAGE to point to a non-default version (something I added for this purpose a while back). BTW I am upgrading all my servers to 7.3 but have not yet done that for atkin and lehner. (It is built on lehner and can be run as /usr/local/sage/sage-2/sage).

AndrewVSutherland commented 8 years ago

For what it's worth

NumberField(QQ[x]([-1,-3,6,4,-5,-1,1]),'a').ideal([1,1])

Is still broken in Sage 7.3 (this is why we still occasionally get timeouts out some degree 6 ecnf pages (and number field pages, it has nothing to do with elliptic curves)).

JohnCremona commented 8 years ago

Is that Q as a number field? This works for me: sage: x = polygen(QQ) sage: K.=NumberField(x) sage: K.ideal([1,1]) Fractional ideal (1)

But certainly this fails: sage: QQx = QQ['x'] sage: NumberField(QQx,'a').ideal([1,1])

Has this been reported as a bug in Sage? It cannot be fixed otherwise. I will look at it anyway.

JohnCremona commented 8 years ago

Wait a minute, if your QQx is the polynomial ring then of course NumberField(QQx,'a') fails since the first argument should be a polynomial (as the error message says).

@AndrewVSutherland Can you point me to place in our code whare this happens?

AndrewVSutherland commented 8 years ago

@JohnCremona It doesn't happen anywhere in our code, it is a one-liner that reproduces the problem that is happening. The code I pasted above looks funny to you because markup is displaying QQ[x](stuff) as QQx (it thinks [x] followed by (stuff) is a hyperlink), I just fixed this (but cut-and-paste will still work in any case).

Just to clarify, if you execute

`NumberField(QQ[x]([-1,-3,6,4,1]),'a').ideal([1,1])`

it will instantly returns the unit ideal in the ring of integers of the number field Q[x]/(x^4+4_x^3+6_x^2-3*x-1)

If you then execute

NumberField(QQ[x]([-1,-3,6,4,-5,-1,1]),'a').ideal([1,1])

it will work fine (instantly returns the unit ideal in the ring of integers of a sextic field).

But now exit sage, restart it and execute just the line above for the sextic field, and you will see it gets stuck. If you press ctrl-c you will see it is stuck inside pari's bnfinit (which is why I did not initially report it to Sage). If you then execute the command again it will work fine (or more generally, if you do just about anything else with pari number fields before executing the problematic line it will work fine.

I cannot reproduce the problem in pari, which is why I have not reported it to them either. There is something very weird going on that appears to only happen the first time Sage calls bnfinit, and it is specific to this field. This is why we rarely see the problem on www.lmfdb.org, but occasionally do.

If you want to reproduce the problem in our code, run start-lmfdb.py locally and try to load the page http://127.0.0.1:37777/NumberField/6.6.371293.1. It will take a very long time (much more than 30 seconds, which is our timeout). But if you instead visit a random number field page first and then go to http://127.0.0.1:37777/NumberField/6.6.371293.1 it will load OK (takes a few seconds).

JohnCremona commented 8 years ago

On 5 August 2016 at 16:14, Andrew Sutherland notifications@github.com wrote:

@JohnCremona https://github.com/JohnCremona It doesn't happen anywhere in our code, it is a one-liner that reproduces the problem that is happening. The code looks funny to you because markup is displaying QQ[x] as QQx (it thinks [x] followed by (something-in-parens) is a hyperlink).

Just to clarify, if you execute

NumberField(QQx,'a').ideal([1,1])

it will instantly returns the unit ideal in the ring of integers of the number field Q[x]/(x^4+4_x^3+6_x^2-3*x-1)

If you then execute

NumberField(QQx,'a').ideal([1,1])

No! In your first line you create a polynomial from a list of integers, fine. But in the second line you give the polynomial ring itself as the first argument. That soes not work for me, even when I replce QQx by QQ[x].

it will work fine (and return the unit ideal in the ring of integers of a sextic field).

But now exit sage, restart it and execute just the line above, and you will see it gets stuck. If you press ctrl-c you will see it is stuck inside pari's bnfinit (which is why I did not initially report it to Sage). If you then execute the command again it will work fine (or more generally, if you do just about anything else with pari number fields before executing the problematic line it will work fine.

I cannot reproduce the problem in pari, which is why I have not reported it to them either. There is something very weird going on that appears to only happen the first time Sage calls bnfinit. This is why we rarely see the problem on www.lmfdb.org, but occasionally do.

I am prepared to believe that there is some bug somewhere, but cannot reproduce it from what you have written.

Some other comments: (1) be careful not to use in python code (in a .py file) short-cuts that are only available on the sage command line becaise of the preparser. (2) Sage predefines x to be an element of the Symbolic Ring. The fact that QQ[x] works is a bit of a hack. You should instead use PolynomialRing(QQ,'x).

John

If you want to reproduce the problem in our code, run start-lmfdb.py locally and try to load the page http://127.0.0.1:37777/ NumberField/6.6.371293.1. It will take a very long time. But if you instead visit a random number field page first and then go to http://127.0.0.1:37779/NumberField/6.6.371293.1 it will load OK (takes a few seconds).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LMFDB/lmfdb/issues/1332#issuecomment-237877482, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9N6SUvRncY8-itLz9QKsdVFoWJT49fks5qc1NHgaJpZM4IeGVj .

AndrewVSutherland commented 8 years ago

@JohnCremona Markup is still screwing up what you are seeing, I did not type what you quote, and I do not see QQx when I view these pages on git hub. I will respond to you directly via e-mail to clear up the confusion.

JohnCremona commented 8 years ago

Thanks -- I knew there was a communication breakdown somewhere.

edgarcosta commented 8 years ago

sage-7.1 and sage-7.2 on both webservers.

Let me know when you would like to switch from using sage-7.1 to sage-7.2.

AndrewVSutherland commented 8 years ago

@edgarcosta We should surely switch on beta first and let it run for a few days there before switching on the cloud.

This is actually tagged for v1.2, but I have no problem with going ahead now if people are keen to do so.

JohnCremona commented 8 years ago

beta.lmfdb.org is now running 7.2 . Can be switched back if necessary. Next we can put lmfdb.warwick.ac.uk onto 7.2 perhaps. If we keep beta on 7.2 (as I expect we will) we should ask developers to upgrade their Sage versions to 7.2 (at least), and all our testing before merging should be done with 7.2.

JohnCremona commented 8 years ago

I notice that on lmfdb.warwick the 10 sage processes running the beta server are all using 80-100% CPU. I cannot think of any reason for it -- the processes running the local prod server are using almost none. Ias anyone doing some stress testing on beta?

AndrewVSutherland commented 8 years ago

@JohnCremona Not I.

JohnCremona commented 8 years ago

Maybe I'll restart it and see what difference that makes. I'm finding it hard to get anything from http://beta.warwick.ac.uk/ at all. Restarting...

JohnCremona commented 8 years ago

...no better. Reverting to Sage-7.1 to see if that helps.

JohnCremona commented 8 years ago

Well the Sage processes are now tamed, but the beta website does not actually work. Sorry, I have no idea what is not working.

JohnCremona commented 8 years ago

beta is working again now -- but I did nothing and have no idea what was stopping it loading pages before.

edgarcosta commented 8 years ago

@JohnCremona it wasn't me...

JohnCremona commented 8 years ago

Before I forget what happened here:

1. beta was running happily, using Sage 7.1
2. I stopped it and restarted with 7.2.  Seemed OK but...
3. ...a few hours later I noticed that all 4 slave python processes were running at 100% CPU.
4. I restarted and again 100% CPU.
5. I switched back to 7.1 and restarted.  Now the beta server would not reposnd at all, for a while...
6. ...but after several minutes it was running normally.

I have no idea why either the server was using all that CPU (with 7.2), or why it took such a long time to get back up (with 7.1). It is not good to have so little an idea of what was going on. But I do not believe that the change of Sage version as such had anything to do with it since running the tests works fine with 7.2. It is more likely to be due to a lot of python modules having been updated -- the the 7,2 build they were installed from the list in requirements.txt and picked up the latest versions. For example: gunicorn 19.4.5 -> 19.5.0, Markdown 2.6.5 -> 2.6.6, Werkzeug 0.11.4 -> 0.11.5. The other differences are not things we add explicitly.

Any ideas on what to do next other than trying again? I could try reverting those 3 modules in 7.2 to the versions in the working 7.1?

AndrewVSutherland commented 8 years ago

@JohnCremona It might be worth trying again without changing anything and monitoring the system so we can see when/if this recurs, which page request triggered it. Without a way to reliably reproduce the problem it will be hard to diagnose. We might then want to set up a separate webserver that we can experiment with to try and narrow down the problem (e.g. by reverting various modules as you suggest).

The non-responsiveness on restarting with 7.1 could be related to the sage/pari init problem we have seen -- there is definitely something weird that happens the very first time sage makes certain calls to pari, but after that things are fine (this is why I think we can't reliably reproduce https://github.com/LMFDB/lmfdb/issues/1358, I believe the problem is still there but it only occurs if you access the page right after restarting sage -- once it gets past this problem everything is fine (slowish but not horribly slow), and this also applies to new worker threads that get forked).

But the moral here is not take Sage upgrades likely -- even if the Sage code changes are trivial or unrelated to the LMFDB, version changes in the various packages it depends on could still bite us.

JohnCremona commented 8 years ago

OK, very sensible suggestions. And yes, many Sage constituents so get upgraded. Yet we should be able to detect most consequences with our test scripts. I'll restart beta with 7,.2 now and see what happens. I have an hour to watch, and we can put it back if necessary.

JohnCremona commented 8 years ago

Done. Running OK, pages respond reasonably, but the processes are all using around 90% CPU. I'll leave it for a while at least.

JohnCremona commented 8 years ago

See http://lmfdb.warwick.ac.uk/ganglia where lmfdb is node 00-8c-fa-56-d7-04 (I have not been able to work out why ganglia uses correct names for some nodes but not all -- something to do with reverse DNS lookup).

AndrewVSutherland commented 8 years ago

@JohnCremona So is it correct that the high cpu utilization is present from the moment you start the server? If so it might be worth trying what you suggest and reverting to earlier versions of some of the packages that are likely suspects and restarting.

At least beta.lmfdb.org is still functional, but obviously we should figure out what is causing this.

JohnCremona commented 8 years ago

Yes, and the processes using CPU are all

python /home/lmfdb/sage2/local/bin/gunicorn -c gunicorn-config-beta lmfdb.website:app

while the similar ones without -beta are using almost none.

OK, I will try one by one reverting those modules and restarting.

JohnCremona commented 8 years ago

Rolled back gunicorn from 19.5.0 to 19.4.5, makes no difference.

edgarcosta commented 8 years ago

Given that you have root access, I would look into this: https://chezsoi.org/lucas/blog/2014/11/07/en-gdb-python-macros/

that might give us a clue where the code is spending much of its cycles.

JohnCremona commented 8 years ago

Same with Markdown: no difference (though the processes are now only using 75% CPU). @edgarcosta I will look.

edgarcosta commented 8 years ago

Other idea: https://wiki.python.org/moin/DebuggingWithGdb

All this with the end goal is to look at the trace of what is going on with that process.

If you try it a couple of times, you should be seeing the same thing over and over, and that probably will be the source of those extra cycles....

This would be my approach, but might not work.

JohnCremona commented 8 years ago

Iwas not able to attach to the runnin python process with gdb (I have done this many times with C++ programs), but using htop I can see that each of the rogue python processes is runnin gap, so that might be the culprit. Our sage1 (=7.1) build has version 4.7.9 of gap and libgap while sage2=7.2 has version 4.8.3, as does sage3=7.3.

JohnCremona commented 8 years ago

Actually gap might be a red herring here. Each python process does have a subsidiary gap process, but it is the parent using the CPU not the child. A different issue I just noticed is that mongod is using 75% of available RAM (around 48G) -- surely more than normal?

JohnCremona commented 8 years ago

I am about to restart beta with Sage 7.3 just to see if it makes any difference. If it does, I will suggest that we go straight from 7.1 to 7.3 (but no hurry). Otherwise I'll set it back to 7.1.

AndrewVSutherland commented 8 years ago

Sounds reasonable.

On 2016-08-11 04:30, John Cremona wrote:

I am about to restart beta with Sage 7.3 just to see if it makes any difference. If it does, I will suggest that we go straight from 7.1 to 7.3 (but no hurry). Otherwise I'll set it back to 7.1.

JohnCremona commented 8 years ago

Success! beta is now running Sage-7.3 and non of the 10 processes is using more than a tiny amount of CPU. I do not reall know whether it is the Sage version as such which is making a difference here, as several python packages are on different versions (see the files pip1, pip2, pip3 for which versions of everyhing are in the sage1 (=7.1), sage2 (=7.2) and sage3 (=7.4) builds.

AndrewVSutherland commented 8 years ago

Given that 7.3 is now the current Sage release, if it appears to work I see no reason to spend time and energy trying to figure out why Sage 7.2 is causing us problems. But we should probably let beta run on Sage 7.3 for a a few days to be sure.

JohnCremona commented 8 years ago

I agree.

edgarcosta commented 8 years ago

I agree.

A different issue I just noticed is that mongod is using 75% of available RAM (around 48G) -- surely more than normal? I'm not sure what is going on. Perhaps restart it?

On 11 August 2016 at 05:32, John Cremona notifications@github.com wrote:

I agree.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LMFDB/lmfdb/issues/1332#issuecomment-239113918, or mute the thread https://github.com/notifications/unsubscribe-auth/AATtBjIee8S5qzT30KwZTLdSxBjFtNr_ks5qeuwcgaJpZM4IeGVj .

JohnCremona commented 8 years ago

OK, I will restart it. I didn't want to without you knowing though.