SeaBee-no / documentation

Repo for all SeaBee documentation
https://seabee-no.github.io/documentation/
0 stars 0 forks source link

Move data from sharepoint to sigma2 #6

Closed knl88 closed 1 year ago

knl88 commented 2 years ago

There is currently some data in sharepoint that should be added to the sigma2 pvc seabee-missions subpath, rclone could be a nice option for this since it supports both minio s3 and sharepoint. But that would be a three step move, another option would be from sharepoint straight into the pcv subpath.

knl88 commented 2 years ago

On WP4/1_DATA-SeaBee we have

Total objects: 251.271k (251271)
Total size: 1.408 TiB (1548437200187 Byte)

While the pvc on sigma2 is

NAME                                   STATUS   VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
71aea088-9e00-4f3c-adf7-0579fe1c38b8   Bound    seabee-ns9879k   42Gi       RWX                           419d

Perhaps we should get some more storage? @JamesSample @KristofferKa

Also tested

rclone copy seabee-drive:WP4/1_DATA-SeaBee/2017/2017-06-29_OlbergS/1_drone seabee-minio:missions/

That seems to work fairly well, although probably better to run in straight on the cluster to avoid one copy step.

JamesSample commented 2 years ago

Yep, we should definitely ask for some more storage space on Sigma. Is Lorand the person to contact about this, @KristofferKa?

Is it easy to expand these volumes as necessary, @knl88, or do we have to create an empty, larger volume and then copy all the data across each time? If increasing the storage is easy, I guess asking for 2-3 TB right now would be a good start. However, if we have to copy everything every time we expand the drive, maybe asking for 5 TB straightaway would be more sensible (so we can focus on processing all the data from this summer, without worrying about disk space).

knl88 commented 2 years ago

Yes should be fairly easy to expand, we should in principle just need to edit the pvc. Although I am not sure how the pvc was created initially, and not sure if we have permission to edit pvc. We probably need to restart the deployments using the pvc.

KristofferKa commented 2 years ago

Yep, we should definitely ask for some more storage space on Sigma. Is Lorand the person to contact about this, @KristofferKa?

Yes, I think raising this with Lorand is a good starting point, @JamesSample... (potentially adding Francesca Iozzi and Hans Eide, if necessary)

According to our application (see: allocation letter) we should be "eligable" for up to 10 TiB on NIRD, so pending Sigma's capacity constraints (issues/plans) we should have access to more than we need at the moment (hopefully also including new data coming in from the field work at Runde after summer?!).

Is it easy to expand these volumes as necessary, @knl88, or do we have to create an empty, larger volume and then copy all the data across each time? If increasing the storage is easy, I guess asking for 2-3 TB right now would be a good start. However, if we have to copy everything every time we expand the drive, maybe asking for 5 TB straightaway would be more sensible (so we can focus on processing all the data from this summer, without worrying about disk space).

JamesSample commented 1 year ago

@knl88 Please close this if the transfer completes successfully over the weekend. Thanks!

knl88 commented 1 year ago

Transferred ~3.6TiB and ~715,704 objects to the niva bucket, out of this it is reporting 7 errors. Think I stop it for now, and perhaps we can do a sync at a later stage for new changes:)

If we where to do this once more I think we could speed things up with an own client id for rclone:)