Closed AndrewVSutherland closed 8 years ago
I can think of no reason for one of us not to remove these 5 right now. If they are removed from Warwick what happens with replication? I don't know how to connect or the cloud replicas.
I can take care of the cloud. If you delete it on Warwick I believe this should get propagated automatically to the replicas (but @edgarcosta can confirm).
On 2016-05-21 10:14, John Cremona wrote:
I can think of no reason for one of us not to remove these 5 right now. If they are removed from Warwick what happens with replication? I don't know how to connect or the cloud replicas.
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220780021
@JohnCremona, @edgarcosta The five empty databases listed above have been removed from the cloud mongo db (running on ms.lmfdb.xyz) and no longer appear on www.lmfdb.org/api/
You beat me to it, I have removed the first one....and now the others. There is no such thing as "the mongo server at atkin". The warwick mongoserver is on lmfdb.warwick.ac.uk and looking at beta.lmfdb.org/api you can verify that they are gone from there.
@JohnCremona, @davidfarmer Do we need the databases knowledge_5, knowledge_6, knowledge_7, knowledge_8, knowledge_9, and knowledge_tmp? It looks to me like the knowledge database contains everything that is in these.
@sehlen Do we still need the databases modularforms and modularforms_raw? It looks like the only place where modularforms is referenced is in test_root.py (which should presumably be changed to modularforms2).
@AndrewVSutherland The knowledge_* stuff is my fault. Cleaning it now.
edit: Done
@JohnCremona I presume it makes sense to remove the "limbo" database from the cloud server (and possibly Warwick as well?)
Yes. It was the original Artin representation database, but has been superseded by artin. So yes, it can be deleted.
OK then I'll do the honours at the Warwick end...done (limbo dropped).
What about "MaassWaveForm" (without the "s")?
There are 2 files that mention it:
limbo also dropped at ms.lmfdb.xyz
mongo makes it much too easy to create a new database by mistake after a typo. At least, that was true before we added authentication.
The collection ‘MaassWaveForm’ can be dropped, similarly the collections ‘modularforms_raw’ (this is not really used and was something I was experimenting with) and ‘modularforms’ (this was the predecessor to ‘modularforms2’ and hasn’t been used since 2012) can both be dropped.
Fredrik
On 21 May 2016, at 16:58, edgarcosta notifications@github.com wrote:
What about "MaassWaveForm" (without the "s")?
There are 2 files that mention it:
never ends up using it : https://github.com/LMFDB/lmfdb/blob/master/lmfdb/modular_forms/maass_forms/maass_waveforms/backend/mwf_utils.py https://github.com/LMFDB/lmfdb/blob/master/lmfdb/modular_forms/maass_forms/maass_waveforms/backend/mwf_utils.py Wants to check for "MaassWaveForms" instead?: https://github.com/LMFDB/lmfdb/blob/master/lmfdb/test_root.py https://github.com/LMFDB/lmfdb/blob/master/lmfdb/test_root.py — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220785484
Catching!
Done at warwick
lmfdb0:PRIMARY> use MaassWaveForm
switched to db MaassWaveForm
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "MaassWaveForm", "ok" : 1 }
lmfdb0:PRIMARY> use modularforms_raw
switched to db modularforms_raw
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "modularforms_raw", "ok" : 1 }
lmfdb0:PRIMARY> use modularforms
switched to db modularforms
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "modularforms", "ok" : 1 }
and ms:
> use limbo
switched to db limbo
> db.dropDatabase()
{ "dropped" : "limbo", "ok" : 1 }
> use MaassWaveForm
switched to db MaassWaveForm
> db.dropDatabase()
{ "dropped" : "MaassWaveForm", "ok" : 1 }
> use modularforms_raw
switched to db modularforms_raw
> db.dropDatabase()
{ "dropped" : "modularforms_raw", "ok" : 1 }
> use modularforms
switched to db modularforms
> db.dropDatabase()
{ "dropped" : "modularforms", "ok" : 1 }
I also took a snapshot before going on this dropDatabase spree...
@edgarcosta Good idea. I was thinking that we also have the weekly dumps, but they are not kept for that long (40 days according to the backup script). Though I do have some other copies.
The db just got much slimmer:
@JohnCremona we should discuss how frequently, and for how long we should keep backups in the cloud
@edgarcosta Your previous comment reminded me of a question you asked me (no idea which thread or issue or what) which I did not understand. If it's about filesystems used on the Warwick server please include Bober and Schilly (perhaps you did).
Does anyone know anything about ap_statistics? It is not referenced by any code in LMFDB/lmfdb.
It contains data about a(p)’s for newforms of weight 2, trivial character and where the degree of the coefficient field is 2.
It is data which I and David F. was/are planning to use to analyse a variant of Maeda’s conjecture…
(The collection ‘madea’ in ‘modularforms2’ contains a subset of collection ap_data on the database ‘ap_statistics)
I just mongodumped the database so you can safely delete it if you like.
At some point in the future we would probably like to display statistics for modular forms but we have to think more about exactly what should be (pre-)computed and what should be stored where.
The reason why there are so many collections in modularforms2 is that the mongo server in warwick (previously Washington I think) was basically the only database we all had had access to so it was used for a lot of testing, experimenting development and debugging…
Basically every collection on modularforms2 which is not referenced to somewhere in the lmfdb can probably be safely deleted (@sehlen?).
.
On 21 May 2016, at 19:35, Andrew Sutherland notifications@github.com wrote:
Does anyone know anything about ap_statistics? It is not referenced by any code in LMFDB/lmfdb.
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220793534
@fredstro, @sehlen, So when I updated the modularforms2 data on www.lmfdb.org I copied over the following 11 collections:
dimension_table dimension_table.chunks dimension_table.files
webmodformspace webmodformspace.chunks webmodformspace.files
webnewforms webnewforms.chunks webnewforms.files
webeigenvalues.chunks webeigenvalues.files
I see the code also references "webmodformspace_dimension" in emf_utils.py
Are there any other collections that are needed?
@fredstro OK, it sounds like I can definitely remove ap_statistics from the cloud, I'll let @JohnCremona decide what to do on the Warwick machines.
@davidfarmer is it safe to remove the database Lfunction from the cloud (and possibly Warwick also)? It looks like the code uses the Lfunctions database (both for new and old formats)
As luck would have it one of the tests in test_root checks for MaassWaveForm, so now fails. Rather than fix this one, we should adapt the test_db function there to check for a complete list of the databases which we actually need.
I saw that, and I was going to edit that, but then I noticed that in the master branch it is an empty file: https://github.com/LMFDB/lmfdb/blob/master/test_root.py
Wrong file: lmfdb/test_root.py is the one. That empty file can probably be deleted!
@jwj61 the test of the link to http://hobbes.la.asu.edu/lmfdb-14/ now fails. Is that just temporary? There's a test for it in test_acknowledgements.py
Temporary. Due to work in the building housing hobbes, all computers have been shut down for a week (and maybe more, but hopefully just a week).
As luck would have it one of the tests in test_root checks for MaassWaveForm, so now fails. Rather than fix this one, we should adapt the test_db function there to check for a complete list of the databases which we actually need.
new issue? Perhaps we should first take care of: https://github.com/LMFDB/lmfdb-inventory maybe assign the issues to the owners?
@edgarcosta fixing test_root.py is something we should probably take care of. Getting lmfdb-inventory up to date for each database is something we can assign to the owners, but first we need to decide which databases should really be there and actually give them owners. Currently lmfdb-inventory has a .md file for every database that was in the Warwick mongo db at the time it was created (several of which we just deleted), many of which are just stubs with no owner. Part of this issue should be fixing this.
@AndrewVSutherland I totally agree with you. However, I don't know a nice and simple way to figure out which DBs we really want to have there.
@edgarcosta That is one of the goals of this issue; as noted above one way to check is to search the code for references to the database names. Here is a list of databases that I know should be on the production cloud mongo db servring data to www.lmfdb.org (ms.lmfdb.xyz)
sato_tate_groups localfields elliptic_curves numberfields knowledge siegel_modular_forms Lattices hmfs Lfunctions modularforms2 MaasWaveForms transitivegroups artin
Here is a list of databases that are used in the code and potentially accessible from www.lmfdb.org. It would be good to confirm their exact status:
HTPicard siegel_modular_forms_experimental SL2Zsubgroups
Here is a list of additional databases that I know need to be on the Warwick dev/beta database (lmfdb.warwick.ac.uk) but are not needed in the cloud (yet) because they are only used in beta mode (in some cases not in the sidebar yet, but there is code in lmfdb/lmfdb that uses them):
mod_l_eigenenvalues halfintegralmf hgm curve_automorphisms
Here is a list of database that look useful but are not referenced in code. They may have future use and we might want to keep them on the Warwick machine. Definitely not needed in the cloud:
abvar (only on Warwick, not in cloud) ap_statistics (see https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220794717, not in cloud) bmfs (Bianchi modular forms data? in cloud now, could/should be removed)
Just to follow up on this, according to "https://github.com/LMFDB/lmfdb-inventory/blob/master/db-Lfunction.md" we should be able to delete the Lfunction database (everything is in Lfunctions).
Any objections?
Not from me. Two of the collectios are *Test anyway, so if we are sure about the 438 items in LemurellMaassHighDegree then fine.
@JohnCremona I am not sure, and in fact I notice that this collection is referenced in the code, see https://github.com/LMFDB/lmfdb/search?utf8=%E2%9C%93&q=LemurellMaassHighDegree.
Maybe it would be a good idea to copy this collection into the Lfunctions database? Or at least ask @davidfarmer about it?
And in fact the two "test" collections are also referenced in https://github.com/LMFDB/lmfdb/blob/master/lmfdb/lfunctions/LfunctionDatabase.py
OK let's leave them. L-functions are in a transitional stage so we cannot expect the set of collections to be 100% tidy right now.
Should I merge #1373 into this issue? Most of the databases have those two collections that I believe to be relics of mongodb 2.4.
From Drew's list a couple days ago:
curve_automorphisms is "mine". I'm still a bit fuzzy about the cloud vs beta/development, but pages like http://www.lmfdb.org/HigherGenus/C/aut/3.96-64.0.2-3-8 use that database. There is no link on the sidebar yet, even in beta (I want to make sure a couple of other features are in place before then.)
Making a page for it on lmfdb-inventory is on my to do list.
@jenpaulhus Thanks for the confirmation. Given that the pages are accessible (but hidden) on www.lmfdb.org, I think it makes sense to leave these on the "cloud server", by which I mean the mongo db hosted by ms.lmfdb.xyz, which serves data to the web servers hosting www.lmfdb.org (and running the LMFDB application code), all of which are instances in Google's Compute Engine (aka "the cloud"); this is separate from the mongo db hosted by lmfdb.warwick.ac.uk, which serves data to beta.lmfdb.org as well as the hosts atkin and lehner at Warwick which are used for development.
@edgarcosta Yes I think it makes sense to merge #1373 into this one. So far we have been focusing on the database level, but the next step is to drill down to collections.
It is safe to delete the Lfunction database. Everything of value is now in Lfunctions. The code that references the collections is obsolete
/Stefan
From: Andrew Sutherland [mailto:notifications@github.com] Sent: den 24 maj 2016 17:52 To: LMFDB/lmfdb lmfdb@noreply.github.com Subject: Re: [LMFDB/lmfdb] Remove obsolete/empty databases from mongo db (#1431)
@JohnCremonahttps://github.com/JohnCremona I am not sure, and in fact I notice that this collection is referenced in the code, see https://github.com/LMFDB/lmfdb/search?utf8=%E2%9C%93&q=LemurellMaassHighDegree.
Maybe it would be a good idea to copy this collection into the Lfunctions database? Or at least ask @davidfarmerhttps://github.com/davidfarmer about it?
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/LMFDB/lmfdb/issues/1431#issuecomment-221316402
Thanks Stefan. It would then be helpful to remove the obsolete code, could you do that? Nothing g is ever really lost it git anyway.
With help from @sehlen and @fredstro I was able to remove what I believe are all unnecessary collections from the modularforms2 database in the cloud; only the following collections remain:
dimension_table webmodformspace webmodformspace.chunks webmodformspace.files webnewforms webnewforms.chunks webnewforms.files webeigenvalues.chunks webeigenvalues.files webchar webchar.chunks webchar.files
Mongo DB now shows 540GB of free space available in this database. Only about 85GB is currently used. This will double when we copy over new files (and want to save backups of the old ones temporarily), and it will increase again once we add in the missing data, but my guess is that it will stabilize well below the 622GB currently allocated. @edgarcosta should we think about dumping and restoring this database at some point to recover the unused space?
I am just about ready to close this issue, the only outstanding things in my mind are (1) the obsolete Lfunction database that is still referenced in the code (see https://github.com/LMFDB/lmfdb/issues/1456), and (2) the fact that there are still a lot of unnecessary collections in the modularforms2 database in Warwick taking up a lot of space (my guess is that about half the space in the mongo db on lmfdb.warwick.ac.uk is taken up by collections in modularforms2 that do not need to be there). Getting rid of it would speed up backup/restore operations (among other things).
@AndrewVSutherland I don't follow what you mean by this:
Mongo DB now shows 540GB of free space available in this database.
The disk? and what server are you talking about?
Regarding
should we think about dumping and restoring this database at some point to recover the unused space?
I believe that Sarunas is done testing the different storage engines, see the *png files in: https://github.com/edgarcosta/lmfdb-gce/tree/master/ab/ms-bench_n1hcpu4_storage Thus we are close to just push a new snapshot of the DB to ms.lmfdb.xyz.
In mongo on ms.lmfdb.xyz do db.stats and look at the free list size.
On May 29, 2016 6:07:35 PM EDT, edgarcosta notifications@github.com wrote:
@AndrewVSutherland I don't follow what you mean by this:
Mongo DB now shows 540GB of free space available in this database. The disk? and what server are you talking about?
Regarding
should we think about dumping and restoring this database at some point to recover the unused space? I believe that Sarunas is done testing the different storage engines, see the *png files in: https://github.com/edgarcosta/lmfdb-gce/tree/master/ab/ms-bench_n1hcpu4_storage Thus we are close to just push a new snapshot of the DB to ms.lmfdb.xyz.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-222385599
Sent from my Android device with K-9 Mail. Please excuse my brevity.
I see. When we switch the engine in non-human effort way, mongo will take care of all that. We will do this soon.
@edgarcosta Sounds good. I just copied over a new set of modular forms data so that there is currently two copies of evey collection in modularforms2 and there is still 460GB free.
The following databases are either empty or have only empty collections in them:
mwf_dbname modularforms_2010 quadratic_twists modforms WebNewForms
Several others are never referenced by any of the code in LMFDB/lmfdb and have no inventory information in LMFDB/inventory (I will post a list later).
These should definitely be removed from the cloud server (so that they do not appear on http://www.lmfdb.org/api/, for example). Presumably the empty ones (and possibly others) can/should also be removed from the mongo db on Atkin.