BagIt Download - Githubissues

ThomasThelen commented 2 years ago

Hi,

I was browsing though the source code and ran across a BagIt download implementation. I'm not totally sure-but is there an extra write of the data to the filesystem? I saw this comment. If so, I solved this problem in DataONE's data server by creating SpeedBagIt which gets around duplicating data for serving datasets. If so, I'd be more than happy to optimize the BagIt routine here.

BobSimons commented 2 years ago

Thanks for the offer, but... My understanding is that your system requires that the files to be included in the BagIt file already exist. You thus make the BagIt file and serve it to the user on-the-fly. The situation in ERDDAP is different. The source files for the BagIt file don't already exist. Yes, you or I could make it so that the needed files are created and added to the BagIt file directly, but the time and disk space needed for ERDDAP's approach haven't been a problem. It seems like a significant project for minimal payoff, and which would add complexity to ERDDAP (never a good thing). And BagIt creation is a secondary, not-frequently-used feature of ERDDAP. So I don't think it is worth the time and effort.

There are other things with higher priority on the To Do list. I welcome your assistance with these projects. Please see https://github.com/BobSimons/erddap/labels/enhancement

Best wishes.

ThomasThelen commented 2 years ago

your system requires that the files to be included in the BagIt file already exist That's correct- it operates on InputStreams to the bytes

Totally understand that the ERDAP architecture differs in that aspect-I'll be sure to browse around the other issues :)

Cheers,

Thomas

ERDDAP / erddap

BagIt Download #64