LMFDB / lmfdb

L-Functions and Modular Forms Database
Other
246 stars 199 forks source link

Remove obsolete/empty databases from mongo db #1431

Closed AndrewVSutherland closed 8 years ago

AndrewVSutherland commented 8 years ago

The following databases are either empty or have only empty collections in them:

mwf_dbname modularforms_2010 quadratic_twists modforms WebNewForms

Several others are never referenced by any of the code in LMFDB/lmfdb and have no inventory information in LMFDB/inventory (I will post a list later).

These should definitely be removed from the cloud server (so that they do not appear on http://www.lmfdb.org/api/, for example). Presumably the empty ones (and possibly others) can/should also be removed from the mongo db on Atkin.

JohnCremona commented 8 years ago

I can think of no reason for one of us not to remove these 5 right now. If they are removed from Warwick what happens with replication? I don't know how to connect or the cloud replicas.

AndrewVSutherland commented 8 years ago

I can take care of the cloud. If you delete it on Warwick I believe this should get propagated automatically to the replicas (but @edgarcosta can confirm).

On 2016-05-21 10:14, John Cremona wrote:

I can think of no reason for one of us not to remove these 5 right now. If they are removed from Warwick what happens with replication? I don't know how to connect or the cloud replicas.


You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220780021

AndrewVSutherland commented 8 years ago

@JohnCremona, @edgarcosta The five empty databases listed above have been removed from the cloud mongo db (running on ms.lmfdb.xyz) and no longer appear on www.lmfdb.org/api/

JohnCremona commented 8 years ago

You beat me to it, I have removed the first one....and now the others. There is no such thing as "the mongo server at atkin". The warwick mongoserver is on lmfdb.warwick.ac.uk and looking at beta.lmfdb.org/api you can verify that they are gone from there.

AndrewVSutherland commented 8 years ago

@JohnCremona, @davidfarmer Do we need the databases knowledge_5, knowledge_6, knowledge_7, knowledge_8, knowledge_9, and knowledge_tmp? It looks to me like the knowledge database contains everything that is in these.

AndrewVSutherland commented 8 years ago

@sehlen Do we still need the databases modularforms and modularforms_raw? It looks like the only place where modularforms is referenced is in test_root.py (which should presumably be changed to modularforms2).

edgarcosta commented 8 years ago

@AndrewVSutherland The knowledge_* stuff is my fault. Cleaning it now.

edit: Done

AndrewVSutherland commented 8 years ago

@JohnCremona I presume it makes sense to remove the "limbo" database from the cloud server (and possibly Warwick as well?)

jwj61 commented 8 years ago

Yes. It was the original Artin representation database, but has been superseded by artin. So yes, it can be deleted.

JohnCremona commented 8 years ago

OK then I'll do the honours at the Warwick end...done (limbo dropped).

edgarcosta commented 8 years ago

What about "MaassWaveForm" (without the "s")?

There are 2 files that mention it:

edgarcosta commented 8 years ago

limbo also dropped at ms.lmfdb.xyz

JohnCremona commented 8 years ago

mongo makes it much too easy to create a new database by mistake after a typo. At least, that was true before we added authentication.

fredstro commented 8 years ago

The collection ‘MaassWaveForm’ can be dropped, similarly the collections ‘modularforms_raw’ (this is not really used and was something I was experimenting with) and ‘modularforms’ (this was the predecessor to ‘modularforms2’ and hasn’t been used since 2012) can both be dropped.

Fredrik

On 21 May 2016, at 16:58, edgarcosta notifications@github.com wrote:

What about "MaassWaveForm" (without the "s")?

There are 2 files that mention it:

never ends up using it : https://github.com/LMFDB/lmfdb/blob/master/lmfdb/modular_forms/maass_forms/maass_waveforms/backend/mwf_utils.py https://github.com/LMFDB/lmfdb/blob/master/lmfdb/modular_forms/maass_forms/maass_waveforms/backend/mwf_utils.py Wants to check for "MaassWaveForms" instead?: https://github.com/LMFDB/lmfdb/blob/master/lmfdb/test_root.py https://github.com/LMFDB/lmfdb/blob/master/lmfdb/test_root.py — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220785484

edgarcosta commented 8 years ago

Catching!

Done at warwick

lmfdb0:PRIMARY> use MaassWaveForm
switched to db MaassWaveForm
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "MaassWaveForm", "ok" : 1 }
lmfdb0:PRIMARY> use modularforms_raw
switched to db modularforms_raw
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "modularforms_raw", "ok" : 1 }
lmfdb0:PRIMARY> use modularforms
switched to db modularforms
lmfdb0:PRIMARY> db.dropDatabase()
{ "dropped" : "modularforms", "ok" : 1 }

and ms:

> use limbo
switched to db limbo
> db.dropDatabase()
{ "dropped" : "limbo", "ok" : 1 }
> use MaassWaveForm
switched to db MaassWaveForm
> db.dropDatabase()
{ "dropped" : "MaassWaveForm", "ok" : 1 }
> use modularforms_raw
switched to db modularforms_raw
> db.dropDatabase()
{ "dropped" : "modularforms_raw", "ok" : 1 }
> use modularforms
switched to db modularforms
> db.dropDatabase()
{ "dropped" : "modularforms", "ok" : 1 }
edgarcosta commented 8 years ago

I also took a snapshot before going on this dropDatabase spree...

JohnCremona commented 8 years ago

@edgarcosta Good idea. I was thinking that we also have the weekly dumps, but they are not kept for that long (40 days according to the backup script). Though I do have some other copies.

edgarcosta commented 8 years ago

The db just got much slimmer:

edgarcosta commented 8 years ago

@JohnCremona we should discuss how frequently, and for how long we should keep backups in the cloud

JohnCremona commented 8 years ago

@edgarcosta Your previous comment reminded me of a question you asked me (no idea which thread or issue or what) which I did not understand. If it's about filesystems used on the Warwick server please include Bober and Schilly (perhaps you did).

AndrewVSutherland commented 8 years ago

Does anyone know anything about ap_statistics? It is not referenced by any code in LMFDB/lmfdb.

fredstro commented 8 years ago

It contains data about a(p)’s for newforms of weight 2, trivial character and where the degree of the coefficient field is 2. It is data which I and David F. was/are planning to use to analyse a variant of Maeda’s conjecture…
(The collection ‘madea’ in ‘modularforms2’ contains a subset of collection ap_data on the database ‘ap_statistics)

I just mongodumped the database so you can safely delete it if you like.

At some point in the future we would probably like to display statistics for modular forms but we have to think more about exactly what should be (pre-)computed and what should be stored where.

The reason why there are so many collections in modularforms2 is that the mongo server in warwick (previously Washington I think) was basically the only database we all had had access to so it was used for a lot of testing, experimenting development and debugging…

Basically every collection on modularforms2 which is not referenced to somewhere in the lmfdb can probably be safely deleted (@sehlen?).

.

On 21 May 2016, at 19:35, Andrew Sutherland notifications@github.com wrote:

Does anyone know anything about ap_statistics? It is not referenced by any code in LMFDB/lmfdb.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220793534

AndrewVSutherland commented 8 years ago

@fredstro, @sehlen, So when I updated the modularforms2 data on www.lmfdb.org I copied over the following 11 collections:

dimension_table dimension_table.chunks dimension_table.files

webmodformspace webmodformspace.chunks webmodformspace.files

webnewforms webnewforms.chunks webnewforms.files

webeigenvalues.chunks webeigenvalues.files

I see the code also references "webmodformspace_dimension" in emf_utils.py

Are there any other collections that are needed?

AndrewVSutherland commented 8 years ago

@fredstro OK, it sounds like I can definitely remove ap_statistics from the cloud, I'll let @JohnCremona decide what to do on the Warwick machines.

AndrewVSutherland commented 8 years ago

@davidfarmer is it safe to remove the database Lfunction from the cloud (and possibly Warwick also)? It looks like the code uses the Lfunctions database (both for new and old formats)

JohnCremona commented 8 years ago

As luck would have it one of the tests in test_root checks for MaassWaveForm, so now fails. Rather than fix this one, we should adapt the test_db function there to check for a complete list of the databases which we actually need.

edgarcosta commented 8 years ago

I saw that, and I was going to edit that, but then I noticed that in the master branch it is an empty file: https://github.com/LMFDB/lmfdb/blob/master/test_root.py

JohnCremona commented 8 years ago

Wrong file: lmfdb/test_root.py is the one. That empty file can probably be deleted!

JohnCremona commented 8 years ago

@jwj61 the test of the link to http://hobbes.la.asu.edu/lmfdb-14/ now fails. Is that just temporary? There's a test for it in test_acknowledgements.py

jwj61 commented 8 years ago

Temporary. Due to work in the building housing hobbes, all computers have been shut down for a week (and maybe more, but hopefully just a week).

edgarcosta commented 8 years ago

As luck would have it one of the tests in test_root checks for MaassWaveForm, so now fails. Rather than fix this one, we should adapt the test_db function there to check for a complete list of the databases which we actually need.

new issue? Perhaps we should first take care of: https://github.com/LMFDB/lmfdb-inventory maybe assign the issues to the owners?

AndrewVSutherland commented 8 years ago

@edgarcosta fixing test_root.py is something we should probably take care of. Getting lmfdb-inventory up to date for each database is something we can assign to the owners, but first we need to decide which databases should really be there and actually give them owners. Currently lmfdb-inventory has a .md file for every database that was in the Warwick mongo db at the time it was created (several of which we just deleted), many of which are just stubs with no owner. Part of this issue should be fixing this.

edgarcosta commented 8 years ago

@AndrewVSutherland I totally agree with you. However, I don't know a nice and simple way to figure out which DBs we really want to have there.

AndrewVSutherland commented 8 years ago

@edgarcosta That is one of the goals of this issue; as noted above one way to check is to search the code for references to the database names. Here is a list of databases that I know should be on the production cloud mongo db servring data to www.lmfdb.org (ms.lmfdb.xyz)

sato_tate_groups localfields elliptic_curves numberfields knowledge siegel_modular_forms Lattices hmfs Lfunctions modularforms2 MaasWaveForms transitivegroups artin

Here is a list of databases that are used in the code and potentially accessible from www.lmfdb.org. It would be good to confirm their exact status:

HTPicard siegel_modular_forms_experimental SL2Zsubgroups

Here is a list of additional databases that I know need to be on the Warwick dev/beta database (lmfdb.warwick.ac.uk) but are not needed in the cloud (yet) because they are only used in beta mode (in some cases not in the sidebar yet, but there is code in lmfdb/lmfdb that uses them):

mod_l_eigenenvalues halfintegralmf hgm curve_automorphisms

Here is a list of database that look useful but are not referenced in code. They may have future use and we might want to keep them on the Warwick machine. Definitely not needed in the cloud:

abvar (only on Warwick, not in cloud) ap_statistics (see https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-220794717, not in cloud) bmfs (Bianchi modular forms data? in cloud now, could/should be removed)

AndrewVSutherland commented 8 years ago

Just to follow up on this, according to "https://github.com/LMFDB/lmfdb-inventory/blob/master/db-Lfunction.md" we should be able to delete the Lfunction database (everything is in Lfunctions).

Any objections?

JohnCremona commented 8 years ago

Not from me. Two of the collectios are *Test anyway, so if we are sure about the 438 items in LemurellMaassHighDegree then fine.

AndrewVSutherland commented 8 years ago

@JohnCremona I am not sure, and in fact I notice that this collection is referenced in the code, see https://github.com/LMFDB/lmfdb/search?utf8=%E2%9C%93&q=LemurellMaassHighDegree.

Maybe it would be a good idea to copy this collection into the Lfunctions database? Or at least ask @davidfarmer about it?

AndrewVSutherland commented 8 years ago

And in fact the two "test" collections are also referenced in https://github.com/LMFDB/lmfdb/blob/master/lmfdb/lfunctions/LfunctionDatabase.py

JohnCremona commented 8 years ago

OK let's leave them. L-functions are in a transitional stage so we cannot expect the set of collections to be 100% tidy right now.

edgarcosta commented 8 years ago

Should I merge #1373 into this issue? Most of the databases have those two collections that I believe to be relics of mongodb 2.4.

jenpaulhus commented 8 years ago

From Drew's list a couple days ago:

curve_automorphisms is "mine". I'm still a bit fuzzy about the cloud vs beta/development, but pages like http://www.lmfdb.org/HigherGenus/C/aut/3.96-64.0.2-3-8 use that database. There is no link on the sidebar yet, even in beta (I want to make sure a couple of other features are in place before then.)

Making a page for it on lmfdb-inventory is on my to do list.

AndrewVSutherland commented 8 years ago

@jenpaulhus Thanks for the confirmation. Given that the pages are accessible (but hidden) on www.lmfdb.org, I think it makes sense to leave these on the "cloud server", by which I mean the mongo db hosted by ms.lmfdb.xyz, which serves data to the web servers hosting www.lmfdb.org (and running the LMFDB application code), all of which are instances in Google's Compute Engine (aka "the cloud"); this is separate from the mongo db hosted by lmfdb.warwick.ac.uk, which serves data to beta.lmfdb.org as well as the hosts atkin and lehner at Warwick which are used for development.

AndrewVSutherland commented 8 years ago

@edgarcosta Yes I think it makes sense to merge #1373 into this one. So far we have been focusing on the database level, but the next step is to drill down to collections.

lemurell commented 8 years ago

It is safe to delete the Lfunction database. Everything of value is now in Lfunctions. The code that references the collections is obsolete

/Stefan

From: Andrew Sutherland [mailto:notifications@github.com] Sent: den 24 maj 2016 17:52 To: LMFDB/lmfdb lmfdb@noreply.github.com Subject: Re: [LMFDB/lmfdb] Remove obsolete/empty databases from mongo db (#1431)

@JohnCremonahttps://github.com/JohnCremona I am not sure, and in fact I notice that this collection is referenced in the code, see https://github.com/LMFDB/lmfdb/search?utf8=%E2%9C%93&q=LemurellMaassHighDegree.

Maybe it would be a good idea to copy this collection into the Lfunctions database? Or at least ask @davidfarmerhttps://github.com/davidfarmer about it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/LMFDB/lmfdb/issues/1431#issuecomment-221316402

JohnCremona commented 8 years ago

Thanks Stefan. It would then be helpful to remove the obsolete code, could you do that? Nothing g is ever really lost it git anyway.

AndrewVSutherland commented 8 years ago

With help from @sehlen and @fredstro I was able to remove what I believe are all unnecessary collections from the modularforms2 database in the cloud; only the following collections remain:

dimension_table webmodformspace webmodformspace.chunks webmodformspace.files webnewforms webnewforms.chunks webnewforms.files webeigenvalues.chunks webeigenvalues.files webchar webchar.chunks webchar.files

Mongo DB now shows 540GB of free space available in this database. Only about 85GB is currently used. This will double when we copy over new files (and want to save backups of the old ones temporarily), and it will increase again once we add in the missing data, but my guess is that it will stabilize well below the 622GB currently allocated. @edgarcosta should we think about dumping and restoring this database at some point to recover the unused space?

I am just about ready to close this issue, the only outstanding things in my mind are (1) the obsolete Lfunction database that is still referenced in the code (see https://github.com/LMFDB/lmfdb/issues/1456), and (2) the fact that there are still a lot of unnecessary collections in the modularforms2 database in Warwick taking up a lot of space (my guess is that about half the space in the mongo db on lmfdb.warwick.ac.uk is taken up by collections in modularforms2 that do not need to be there). Getting rid of it would speed up backup/restore operations (among other things).

edgarcosta commented 8 years ago

@AndrewVSutherland I don't follow what you mean by this:

Mongo DB now shows 540GB of free space available in this database.

The disk? and what server are you talking about?

Regarding

should we think about dumping and restoring this database at some point to recover the unused space?

I believe that Sarunas is done testing the different storage engines, see the *png files in: https://github.com/edgarcosta/lmfdb-gce/tree/master/ab/ms-bench_n1hcpu4_storage Thus we are close to just push a new snapshot of the DB to ms.lmfdb.xyz.

AndrewVSutherland commented 8 years ago

In mongo on ms.lmfdb.xyz do db.stats and look at the free list size.

On May 29, 2016 6:07:35 PM EDT, edgarcosta notifications@github.com wrote:

@AndrewVSutherland I don't follow what you mean by this:

Mongo DB now shows 540GB of free space available in this database. The disk? and what server are you talking about?

Regarding

should we think about dumping and restoring this database at some point to recover the unused space? I believe that Sarunas is done testing the different storage engines, see the *png files in: https://github.com/edgarcosta/lmfdb-gce/tree/master/ab/ms-bench_n1hcpu4_storage Thus we are close to just push a new snapshot of the DB to ms.lmfdb.xyz.


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/LMFDB/lmfdb/issues/1431#issuecomment-222385599

Sent from my Android device with K-9 Mail. Please excuse my brevity.

edgarcosta commented 8 years ago

I see. When we switch the engine in non-human effort way, mongo will take care of all that. We will do this soon.

AndrewVSutherland commented 8 years ago

@edgarcosta Sounds good. I just copied over a new set of modular forms data so that there is currently two copies of evey collection in modularforms2 and there is still 460GB free.