AMI-system / gbif-species-trainer-AMI-fork

Code for training a fine-grained species classification model using data from GBIF
MIT License
0 stars 0 forks source link

Increase Baskerville storage for the AMBER project #10

Closed LevanBokeria closed 11 months ago

LevanBokeria commented 1 year ago

We currently have 1TB of storage for the project vjgo8416-amber on Baskerville. We are using 987GB already: https://admin.baskerville.ac.uk/project/vjgo8416-amber

Is it possible to ask for more space? @ots22

I've looked at how large our folders are, and this is the breakdown:

./conda_envs - 9.4GB ./venv - 1GB ./kg_conda_env - 1GB ./kg_conda_env2 - 7.9GB ./projects - 18GB ./data - 935GB+

Within the data folder:

The DwCa files and the 1000 images I tried downloading per species are taking a lot of space. Eventually, I think we'll need only one of our image databases, either the data/gbif-species-trainer-AMI-fork/gbif_images/ or @KatrionaGoldmann 's version at data/gbif_macro_data/. Plus, some of the large DwCa files I am currently using will be redundant. But for now, would be good to have more space if thats possible.

KatrionaGoldmann commented 1 year ago

Thanks for flagging @LevanBokeria. There are also a lot of duplicated images in binary_moth_training and gbif_macro_data so I can look at reorganizing this.

LevanBokeria commented 1 year ago

Sounds good. But I think we'll still need a lot more space anyway. So far, we've only downloaded images for the macro moths list from David. There are many more UK species in the bigger list, to say nothing about lists for other regions! So I think we'll need a lot of space.

ots22 commented 1 year ago

Seems like they are open to requests with justification https://docs.baskerville.ac.uk/storage/#project-space

Do you have a rough idea of how much we would need, maybe broken down by region?

LevanBokeria commented 1 year ago

Hey! I think 300-400GB per region is not unreasonable, but the following make it a bit hard to estimate:

I'll have a closer look again and can probably come up with a better estimate soon.

LevanBokeria commented 1 year ago

Currently, the folder with gbif images for the UK macro moths is 462GB. This was for roughly 1000 species. Thus, on average, we could calculate ~470MB per species.

The Singapore list has 1345 species, so I'd estimate about 600GB of more space needed... However, again this is hard to say for sure because:

But I think it will be safe to assume each region will need a lot of space, in the order of hundreds of gigabites. Is this unreasonable? I don't have a good sense of what to expect for such computer vision problems.

ots22 commented 1 year ago

So perhaps we could ask to go up to 2 or 3 TB? That sounds like it would let us host data from UK and Singapore right now, do some initial work on another couple of regions and have enough for storing model results and artifacts.

LevanBokeria commented 12 months ago

@ots22 Have they sent any reply to this?

ots22 commented 11 months ago

No, I will send a nudge

LevanBokeria commented 11 months ago

Wow I see we have 3TB now! Thanks @ots22 ! Closing this issue