GenericMappingTools / gmtserver-admin

Cache data and script for managing the GMT data server
GNU Lesser General Public License v3.0
7 stars 3 forks source link

Old files from SRTM15_v2.1 in test server? #159

Closed Esteban82 closed 1 year ago

Esteban82 commented 2 years ago

I was looking in the test server and I found these two folders. I think that the first is an old version (2.1) of the SRTM15, and thus could be deleted.

http://test.generic-mapping-tools.org/server/earth/earth_relief2.1/ http://test.generic-mapping-tools.org/server/earth/earth_relief/

PaulWessel commented 2 years ago

I think we still have a conversation of what to do with tests. Having 60 tests fail each time we update earth_relief means lots of extra work on growth of the dvc system. I think @maxrjones and I were discussing maybe using a test server when running the tests so that we would get a stable set of remote grids to avoid this problem. These files would ideally not be mirrored across to all servers.

Esteban82 commented 1 year ago

I think that this issue can be closed.

PaulWessel commented 1 year ago

OK, woken on this but copying from Hawaii to Oslo is a, well, bit slow!

PaulWessel commented 1 year ago

I may wish to rename the test.generic-mapping-tool.org entry to next or candidate or some other word than test. The data in test are not for testing but candidate data sets for the next release. The word "test" should retain how we use it in all of GMT (test dir, test scripts, etc) and hence setting GMT_DATA_SERVER = test would be what we do to run all the tests using the reference data set. GMT_DATA_SERVER = next would (now) give access to the unreleased files like venus and mars. I may also need to come up with another dir structure on the user's computer: .gmt is what GMT uses. However, if you are I want to try venus we dont want GMT to place venus under the .gmt dir but probably use a separate dir, like ~/.gmt-test and ~/.gmt-next. I have accidentally overwritten stuff in ~/.gmt many times because of forgetting to move .gmt out of the way.

Any preference for next vs candidate? I dont want to use release-candidate since it is long.

Would this be OK with you? I would need IT to change the test->gmtserver setting.

seisman commented 1 year ago

I may wish to rename the test.generic-mapping-tool.org entry to next or candidate or some other word than test. The data in test are not for testing but candidate data sets for the next release. The word "test" should retain how we use it in all of GMT (test dir, test scripts, etc) and hence setting GMT_DATA_SERVER = test would be what we do to run all the tests using the reference data set. GMT_DATA_SERVER = next would (now) give access to the unreleased files like venus and mars.

I prefer to candidate which warns users not to use it.

I may also need to come up with another dir structure on the user's computer: .gmt is what GMT uses. However, if you are I want to try venus we dont want GMT to place venus under the .gmt dir but probably use a separate dir, like ~/.gmt-test and ~/.gmt-next. I have accidentally overwritten stuff in ~/.gmt many times because of forgetting to move .gmt out of the way.

Is it possible to change it via enviromental variables like GMT_DATADIR or GMT_USERDIR?

PaulWessel commented 1 year ago

Yes, GMT_USERDIR=~/.gmt-candidate will use that dir for all "./gmt" work (look for data, create session). So we can let our remote-data test scripts set that, for instance.

OK, candidate is a good name.

PaulWessel commented 1 year ago

Since our hope is that Geoscope can be persuaded to take over the hosting I want to have a clear structure on oceania first. Right now the top gmt directory looks like this:

-bash-4.2$ pwd
/export/gmtserver/gmt
-bash-4.2$ ls -l
total 28
drwxrwxr-x 2 pwessel gmt 4096 May 21  2020 BlackMarble
drwxr-xr-x 2 seisman gmt  286 May 11  2020 BlackMarble2016
drwxrwxr-x 2 pwessel gmt 4096 May 21  2020 BlueMarble
drwxrwxr-x 2 pwessel gmt   33 Aug 11  2021 LOGS
lrwxrwxrwx 1 pwessel gmt    8 Jan 28  2022 data -> data_6.2
drwxrwxr-x 4 pwessel gmt 4096 May 29  2020 data_6.0
drwxrwxr-x 5 pwessel gmt 4096 Jan 28  2022 data_6.1
drwxrwxr-x 5 pwessel gmt 4096 Aug 15 03:00 data_6.2
drwxrwxr-x 9 pwessel gmt  198 Aug  3 08:00 gmtserver-admin
drwxrwxr-x 2 pwessel gmt 4096 Oct 17  2019 old-earth-reliefs-v1
drwxrwxr-x 2 pwessel gmt 4096 Mar 15  2020 old-earth-reliefs-v2
drwxrwxr-x 4 pwessel gmt   87 Apr 30  2022 static
drwxrwxr-x 4 pwessel gmt  261 Jul 29 01:51 test

Only the directory data (i.e., _data6.2) is mirrored or used for reading data by users. Currently, test is where new candidate data should be placed until we release them. SOEST IT helped us set things so that oceania.generic-mapping-tools.org points to the data dir contents while test.generic-mapping-tools.org points to the test dir contents, Here are proposed steps:

  1. Ask IT to change test to candidate in the URL redirection file to point to directory candidate instead. I have duplicated test to candidate so it should work as soon as they see my request in 4-6 hours I hope.
  2. Once working, I will wipe the test dir since only us ever used test.
  3. I see no point keeping the _data6.0, _data6.1 directories as gmtserver just fills up.
  4. Remove old_earth_reliefs*.
  5. static holds an old _earthrelief tree, will see what version - probably duplicate of 2.1.
  6. Have a few Marble dirs I need to look at and possibly delete. Public data so no need to waste space here.
  7. A directory (maybe called reference or read-only or testdata) will be placed in candidate so that these are not mirrored anywhere.
  8. I will make some simple changes in gmt_remote.c so that when server is test then it inserts a /reference or similar string in the URL so we fish from the reference directory when running our test in CI or locally.

Comments allowed!

seisman commented 1 year ago

Since our hope is that Geoscope can be persuaded to take over the hosting

Do you mean "EarthScope" (https://www.earthscope.org/)?

  1. I see no point keeping the _data6.0, _data6.1 directories as gmtserver just fills up.

Yes, actually the name "data_6.2" also makes no sense to me. We should simply name it "data", without any version string.

The problematic new synbath dataset in https://github.com/GenericMappingTools/gmtserver-admin/issues/213 warn us to always backup the old dataset before updating. I think we should have four directories:

When we update an existing dataset or adding a new dataset, we should first copy them to the candidate directory. If the dataset looks good, then we should move the dataset from "data" to "olddata" before copying them from "candidate" to "data". It's a little complicated, but it makes sure that we can quickly revert back to the correct "old" dataset if the updated dataset have issues (like #213).

  1. A directory (maybe called reference or read-only or testdata) will be placed in candidate so that these are not mirrored anywhere.

I don't understand this point.

  1. I will make some simple changes in gmt_remote.c so that when server is test then it inserts a /reference or similar string in the URL so we fish from the reference directory when running our test in CI or locally.

I'm a little confused about this point, too. From your points 7 and 8, are you suggesting adding a reference directory in both candidate and test server? What are the files in the reference directory?

PaulWessel commented 1 year ago

Yes EarthScope sorry. We do them lots of favours with the GMT for Geodesy course so they better help us out.

Yes, data_6.2 serves no function either so we can eliminate that as well and just have data.

The complication re 7-8 has to do with some limit of what SOEST can do. Right now we have two redirects (oceania and test) to different subdirectories on the same server (gmtserver.soest.hawaii.edu). It would of course be cleaner if we can simply add 2 more: candidate and old data. Then we would not have to hunt for directories inside others, etc. Let me as IT if that is possible (cleaner for us) or what the issue is.

PaulWessel commented 1 year ago

I notice that something earlier we had IT set up static.generic-mapping-tools.org for the purpose of using the static reference data that wont change when we do tests. Since already set up we can have oceania, static (many take over for test) and add candidate? I dont think we need old data.generic... since a make file will do stuff like moving things around on the server.

PaulWessel commented 1 year ago

Didn't this use to work?

curl -L data.generic-mapping-tools.org:/gmt_data_server.txt

PaulWessel commented 1 year ago

Sorry, these works but have to do it right:

curl -L http://data.generic-mapping-tools.org/gmt_data_server.txt
curl -L http://static.generic-mapping-tools.org/gmt_data_server.txt
curl -L http://test.generic-mapping-tools.org/gmt_data_server.txt
PaulWessel commented 1 year ago

So if we agree that the old data or similar does not need to be accessible (we will use cp -f etc on the server via makefiles) we have the three we need?

seisman commented 1 year ago

Since already set up we can have oceania, static (many take over for test) and add candidate?

Sounds good to me.

seisman commented 1 year ago

It seems that oceanic, candidate and static are all online now. Does this mean we can remove the old earth_relief_2.1 directory from the main server?

PaulWessel commented 1 year ago

Let's hear from @Esteban82 who has done a lot of the work. @federico, what remains, if anything on the server(s)? My though was that we placed all the new and updated stuff on candidate, then when 6.5 is released we update oceania with everything on candidate.

PaulWessel commented 1 year ago

@seisman, you may have answered this before, but cannot recall:

When the CI runs all the tests, from where does it pull @earth_relief_XXXX etc? Current oceania or some cached files before any updates? I am hoping we can put what is compatible on the static server and then we need to pass static.generic-mapping-tools.org when running tests. This will eliminate any data differences and then we can focus on actual failures - it is too hard to get 70-80 failures and many are data driven,

seisman commented 1 year ago

We have a workflow that downloads the GMT remote data from the oceania server and stores them as GitHub Action cache files. The cache files will then be used by the tests.

PaulWessel commented 1 year ago

OK, so this happens each time the full test is run? I.e., the cache is updated at that time so if I test with oceania we are using the same version?

seisman commented 1 year ago

No, the workflow is scheduled to run once every week (https://github.com/GenericMappingTools/gmt/actions/workflows/ci-caches.yml). If you make any changes to oceania, then we have to manually trigger the workflow to update the caches. After that, you and the CI will use the same version.

PaulWessel commented 1 year ago

Wrote a script that determines which remote files are used in our doc and test scripts. Got these 35 (some are the same since without registration we default to _p:

@earth_age_02m
@earth_age_02m_p
@earth_age_06m
@earth_age_06m_p
@earth_age_10m
@earth_age_10m_p
@earth_day_01d
@earth_day_01m
@earth_day_01m_p
@earth_day_15m
@earth_relief_01d
@earth_relief_01d_g
@earth_relief_01m
@earth_relief_01m_p
@earth_relief_02m
@earth_relief_02m_p
@earth_relief_03s
@earth_relief_04m
@earth_relief_04m_p
@earth_relief_05m
@earth_relief_05m_g
@earth_relief_05m_p
@earth_relief_06m
@earth_relief_06m_p
@earth_relief_10m
@earth_relief_10m_g
@earth_relief_10m_p
@earth_relief_15m
@earth_relief_15m_p
@earth_relief_20m
@earth_relief_20m_g
@earth_relief_30m
@earth_relief_30m_p
@earth_relief_30s
@earth_relief_30s_p

Perhaps we should just place these on the static server (copy from oceania for now) and see how that goes with testing that server? Also, @joa-quim has a point that why use 01m tiles unless the bug is specific to tiling or high-res tiling. If it works for 01m, 05m and 06m then we should simplify the tests and use 06m instead. This means updating some PS files in DVC. The doc scripts and examples may use what they use since we want nice images and not blurry ones.

seisman commented 1 year ago

Perhaps we should just place these on the static server (copy from oceania for now) and see how that goes with testing that server?

Sounds good to me.

Esteban82 commented 1 year ago

Let's hear from @Esteban82 who has done a lot of the work. @federico, what remains, if anything on the server(s)? My though was that we placed all the new and updated stuff on candidate, then when 6.5 is released we update oceania with everything on candidate.

Yes, I think that we can delete from:

PaulWessel commented 1 year ago

Great, can you take care of removing those? Then candidate is fully loaded (venus, moon etc)?

Esteban82 commented 1 year ago

Great, can you take care of removing those?

From both servers, right?

Then candidate is fully loaded (venus, moon etc)?

Yes

PaulWessel commented 1 year ago

Yes, from oceania and candidate since we don't reference 2.1 anywhere

Esteban82 commented 1 year ago

@PaulWessel you will have to delete earth_relief2.5

-bash-4.2$ rm -r earth_relief2.5/
rm: cannot remove ‘earth_relief2.5/earth_relief_03s_g/N44E004.earth_relief_03s_g.nc’: Permission denied
rm: cannot remove ‘earth_relief2.5/earth_relief_03s_g/N44E003.earth_relief_03s_g.nc’: Permission denied
rm: cannot remove ‘earth_relief2.5/earth_relief_03s_g’: Directory not empty

I deleted earth_relief2.1 in both sites.

PaulWessel commented 1 year ago

Thanks, and sorry, I see the 3s had incomplete permissions for the group...

Esteban82 commented 1 year ago

I deleted earth_relief2.1 in both sites.

We delete like 51GB from the server. Joaquim must be happy.

Esteban82 commented 1 year ago

I think we should also remove earth_relief2.1 and 2.5 from the test as well. It is better to leave it tidy

PaulWessel commented 1 year ago

Yes, I think test is just for crazy experiments with new things until that dataset is stable and can go on candidate. So please clean!

Esteban82 commented 1 year ago

So please clean!

I deleted earth_relief2.1. For the 2.5 I don't have group permissions.

PaulWessel commented 1 year ago

OK everything under test should now have group permissions rw, so you should be able to delete

Esteban82 commented 1 year ago

I have a doubt. Should I delete that directory or everything?

PaulWessel commented 1 year ago

You mean "test" itself? No, let that one sit empty but server etc goes

Esteban82 commented 1 year ago

Ok, so I will delete everything inside the server directory.

PaulWessel commented 1 year ago

/export/gmtserver/gmt/test/server/**

Esteban82 commented 1 year ago

Done. But I still can't deleted these files.

-bash-4.2$ pwd
/export/gmtserver/gmt/test/server
-bash-4.2$ rm -r *
rm: cannot remove ‘earth/earth_relief2.5/earth_relief_03s_g/N44E004.earth_relief_03s_g.nc’: Permission denied
rm: cannot remove ‘earth/earth_relief2.5/earth_relief_03s_g/N44E003.earth_relief_03s_g.nc’: Permission denied
rm: cannot remove ‘earth/earth_relief2.5/earth_relief_03s_g’: Directory not empty
Esteban82 commented 1 year ago

Should I aslo delete the files within /export/gmtserver/gmt/test/cache ?

PaulWessel commented 1 year ago

Yep

On 9 September 2023 at 16:44:47, Federico Esteban @.***) wrote:

Should I aslo delete the files within /export/gmtserver/gmt/test/cache ?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/GenericMappingTools/gmtserver-admin/issues/159*issuecomment-1712528339__;Iw!!PvDODwlR4mBZyAb0!XnDaca2mZbSbWktDXwY6iXfjmrUuaMQegEmoB_nSZm_m761EGF5Jmk5EC5Hy-t9MR_rLxvm-3MeR--zYJBX1dX_SGg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGJ7IX3CSJ6NXAR65FOJBPTXZR6F7ANCNFSM567FEQVQ__;!!PvDODwlR4mBZyAb0!XnDaca2mZbSbWktDXwY6iXfjmrUuaMQegEmoB_nSZm_m761EGF5Jmk5EC5Hy-t9MR_rLxvm-3MeR--zYJBXs-q9VJw$ . You are receiving this because you were mentioned.Message ID: @.***>

Esteban82 commented 1 year ago

Just deleted the files in cache.

seisman commented 1 year ago

I think we can close the issue.