databio / bedboss

Python pipeline for processing BED files for BEDbase
https://docs.bedbase.org
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Create package that will project (store) information about s3 storage #26

Closed khoroshevskyi closed 6 months ago

khoroshevskyi commented 9 months ago

Bedbase not only stores data on S3 but also facilitates data transfer using standard bash commands (e.g., aws s3 cp ...). However, during the copying process, it's challenging to determine whether the destination on S3 already exists and what other files are present.

The proposed solution is to create an object that provides a projection of the files on S3. This object would assist users in uploading, deleting, and managing objects within the database.

nsheff commented 7 months ago

I propose no new package/object, but do this:

use boto3 to upload from within bedboss. Then, insert metadata about this upload (when completed successfully) with report using pipestat, so the database knows what files were uploaded and where they are, or whatever (or maybe it's just True if the upload was successful, or something). I guess I could see this being a JSON blob with information about all files that were transferred to s3/b2.

Then, write a function or class that can remove an entry from the database. It would query the info from the main database, then

nsheff commented 7 months ago

This is actually the same as databio/bbconf#37

nsheff commented 6 months ago

I will close this in favor of the the bbconf issue