FredHutch / s3tagcrawler

search s3 based on tags
3 stars 1 forks source link

adding a tag to existing entities in S3 #3

Closed vortexing closed 6 years ago

vortexing commented 6 years ago

Upon further review I have realized that when Globus writes files back to S3 it can add a tag called "workflowID". But now all the processed files in S3 that WEREN'T from Globus don't have any tag or tag value.

I have a list of keys in a bucket that are already in the bucket and just need to either append a key-value pair as a tag, OR I can overwrite all the tags. I did give it a shot just now and just appended the new tag as an additional column on the off chance it might work and it is returning "Number of ops: 0".
screen shot 2018-07-21 at 3 24 10 pm

I wonder about a couple of things for updating tags in bulk:

Perhaps referring to specific columns by name might be better for modifying and correcting tags? Then figuring out if to update any of the tags, you realistically have to know ALL the tags and reset the entire set of tags at the same time would be useful.

dtenenba commented 6 years ago

OK, I will look into this. Might ping you if I have questions.

dtenenba commented 6 years ago

So, there's a couple things about the way this works that might be helpful to explain.

This feature seems more complicated and I am not sure if it should be done but I'll think about it some more and I could maybe be persuaded to change my mind.

If you need to update just certain tags, you could always run the get-s3-tags tool and create a new csv from its output, then just modify the cells that need changing.

So if I implement the first change (allowing arbitrary column/tag names) but not the second (reading existing tags and merging with what's in the CSV) would that give you what you need?

vortexing commented 6 years ago

I wholeheartedly agree with the first change (esp re: making all the essential columns first, then however many tags you want to show up after that).

And the overwrite-all-tags thing was how I'd assumed S3 tagging worked but didn't know if that was just naive. Currently I DO have the output from the get-s3-tags tool so as long as I know whatever I put up there will set the tags to only what I include, then that is 100% fine and the rest of the work to get editing specific tags to work is totally not worth your time.

COOL!!!

dtenenba commented 6 years ago

Fixed. Please (re-)read the README as the usage instructions have changed.

vortexing commented 6 years ago

Looks great! Question for the readme: When only re-tagging and not uploading, you use the flag but also what do you put in the first column (seq_dir, or the directory/file in fast)? Just leave it blank? Can you leave it out? I did this a couple of times but I forget what I did.

vortexing commented 6 years ago

Tried it with this csv, (and tried it once with the seq_dir empty b/c these are already in S3 and no longer have a copy in fast), but it said I had missing columns. I triple checked, but can't see a problem. Am I misinterpreting the instructions?

screen shot 2018-07-27 at 6 57 24 am

dtenenba commented 6 years ago

Change s3_transferbucket to s3transferbucket. No underscore.

vortexing commented 6 years ago

OMG, where's the eyeroll button?

However, I fixed it, tried it again and it gave me the same error. Manifest I'm attempting to use is in my fast (/paguirigan_a/trgen/s3-archive-txfer/18-07-27-tags2edit.csv).

screen shot 2018-07-27 at 8 58 03 am

dtenenba commented 6 years ago

Took me a while to figure this out but there are some weird invisible unicode characters at the very beginning of your CSV which meant that seq_dir did not match seq_dir in the check for required columns. I will email you a fixed version of the csv. Not sure how that got in there.

I notice that the values under seq_dir are relative paths. This will work as long as you run the code from the right directory. Or maybe you are going to run with -t in which case the values in that column do not matter.