Esri / geoportal-server

Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
https://gptogc.esri.com/geoportal
Apache License 2.0
244 stars 149 forks source link

How to purge/remove all records from the database #225

Closed slead closed 8 years ago

slead commented 8 years ago

I accidentally imported thousands of records from http://gptogc.esri.com/geoportal while testing the CSW harvesting.

How can I quickly delete all records from the database so I can start afresh?

Will it cause any problems if I use PGADMIN and delete the records from _gptresource and _gpt_resourcedata? If that's a valid approach, do I need to create new indexes (etc) after doing this?

Thanks

mhogeweg commented 8 years ago

You'll want to do this from the geoportal admin page. If you enabled the 'apply to all' option you would select the records based on the site they were harvested from. Then select the first page and select the 'apply to entire selection' option before selecting 'delete' from the drop down.

This apply to all option is set in gpt.xml.

slead commented 8 years ago

After doing that, Pgadmin shows that those tables now contain 0 records. But searching within This Site still lists thousands of records:

screen shot 2016-06-21 at 12 53 52 pm

Trying to view the Details of a record shows that it has been deleted:

screen shot 2016-06-21 at 12 54 03 pm

So it appears that there's an index out of synch? Also C:\Lucene still contains ~50Mb of data, which could be the old records, perhaps?

I tried restarting Tomcat and restarting the whole PC but neither changes the fact that thousands of records are listed when running a search on This Site.

Any further clues? thanks

mhogeweg commented 8 years ago

yes, the index will need to be updated. you can wait for the scheduled task to kick in. you should also be able to restart tomcat as that typically also starts the sync. obviously, you could remove the index files and then reapprove all remaining records. that will also typically result in rebuilding the index.

slead commented 8 years ago

restarting Tomcat again didn't work, so I guess I'll wait for the scheduled task to kick in. Is there a way to determine when this will occur?

mhogeweg commented 8 years ago

see the scheduler settings in gpt.xml. it allows you to set an interval for re-indexing, but you could also force it to happen at a certain time (say 5 minutes after now).