Closed LevanBokeria closed 11 months ago
Thanks for flagging @LevanBokeria. There are also a lot of duplicated images in binary_moth_training and gbif_macro_data so I can look at reorganizing this.
Sounds good. But I think we'll still need a lot more space anyway. So far, we've only downloaded images for the macro moths list from David. There are many more UK species in the bigger list, to say nothing about lists for other regions! So I think we'll need a lot of space.
Seems like they are open to requests with justification https://docs.baskerville.ac.uk/storage/#project-space
Do you have a rough idea of how much we would need, maybe broken down by region?
Hey! I think 300-400GB per region is not unreasonable, but the following make it a bit hard to estimate:
I'll have a closer look again and can probably come up with a better estimate soon.
Currently, the folder with gbif images for the UK macro moths is 462GB. This was for roughly 1000 species. Thus, on average, we could calculate ~470MB per species.
The Singapore list has 1345 species, so I'd estimate about 600GB of more space needed... However, again this is hard to say for sure because:
But I think it will be safe to assume each region will need a lot of space, in the order of hundreds of gigabites. Is this unreasonable? I don't have a good sense of what to expect for such computer vision problems.
So perhaps we could ask to go up to 2 or 3 TB? That sounds like it would let us host data from UK and Singapore right now, do some initial work on another couple of regions and have enough for storing model results and artifacts.
@ots22 Have they sent any reply to this?
No, I will send a nudge
Wow I see we have 3TB now! Thanks @ots22 ! Closing this issue
We currently have 1TB of storage for the project vjgo8416-amber on Baskerville. We are using 987GB already: https://admin.baskerville.ac.uk/project/vjgo8416-amber
Is it possible to ask for more space? @ots22
I've looked at how large our folders are, and this is the breakdown:
./conda_envs - 9.4GB ./venv - 1GB ./kg_conda_env - 1GB ./kg_conda_env2 - 7.9GB ./projects - 18GB ./data - 935GB+
Within the data folder:
The DwCa files and the 1000 images I tried downloading per species are taking a lot of space. Eventually, I think we'll need only one of our image databases, either the data/gbif-species-trainer-AMI-fork/gbif_images/ or @KatrionaGoldmann 's version at data/gbif_macro_data/. Plus, some of the large DwCa files I am currently using will be redundant. But for now, would be good to have more space if thats possible.