HTBox / allReady

This repo contains the code for allReady, an open-source solution focused on increasing awareness, efficiency and impact of preparedness campaigns as they are delivered by humanitarian and disaster response organizations in local communities.
http://www.htbox.org/projects/allready
MIT License
891 stars 626 forks source link

Delete Task Attachment From Azure Blob Storage #1972

Open dchristensen opened 7 years ago

dchristensen commented 7 years ago

When a Task Attachment is replaced or removed from a task the reference to the attachment is removed from the DB but the actual attachment data is not removed from the Storage Provider. While not critical, this will leave us with a bunch of orphaned attachments taking up space in Azure.

oneolddev commented 7 years ago

@dchristensen I think this would be a great candidate for an Azure Function 😄.

@tonysurma Whose the Azure go to person? I wouldn't mind taking a crack at this.

dchristensen commented 7 years ago

@c0g1t8 I think if we handle the logic in the app correctly and delete the attachments from Blob Storage when they are removed or replaced, like we do with the Org/Campaign/Event images, we could avoid the additional complexity of an Azure function.

oneolddev commented 7 years ago

@dchristensen I agree that fixing the code would be the best approach. I was thinking of creating an Azure function as a clean up mechanism for production data. I've been on projects where data inconsistencies have been introduced by bugs.

In those cases, I've written applications that do the clean up. Some have evolved into data consistency applications that would be run on a schedule. It would run in report only mode by default but could be set to report and and clean up.

I thought this strategy could be applicable. Given that the attachments are in Azure, I thought an Azure Function would be applicable.

mgmccarthy commented 7 years ago

@c0g1t8 @dchristensen

IMHO, we should not be using Azure functions like this. Using them to cleanup something that can be taken care of in the code (which is testable and compile time safe in the IDE) is the way to go. Not that Azure functions are NOT testable, but you don't need an Azure account as a developer in order to do this work in the IDE.

The transactional consistency of this operation can be broken into two handlers. What I mean by this is one transaction can handle removing the reference to the task attachment, then we can publish a MediatR event which can be handled. That handler should take care of the blob cleanup.

oneolddev commented 7 years ago

@mgmccarthy

First, I fully agree with you and @dchristensen that fixing the problem at origin is always the preferable approach.

The original issue reported was

When a Task Attachment is replaced or removed from a task the reference to the attachment is removed from the DB but the actual attachment data is not removed from the Storage Provider.

I wanted to suggest the use of Azure functions as a clean up mechanism for production data. I prefer automated versus manual clean up.

you don't need an Azure account as a developer

I've been using Azure for a number of years 😄. I asked about the Azure go to person because I suspected that there was a Plan and wanted to connect.

mgmccarthy commented 7 years ago

@c0g1t8,

I'm not too sure there is a "Azure go to person" on the team currently.

That being said, I thikk this decision depends on what needs to be transactionally consistent from the projects point of view. If the project owners want the blob cleaned up at the point of being replaced or removed, then we can do that in code. Furthermore, we can handle the deletion of the blob async from the sql server work. All w/out introducing an Azure function.

I'm not too sure how azure blobs are billed... is it frequency of accessing the blob, calls to the storage over a given timespan or storage?

If it's storage, then optimizing for the quickest cleanup instead of using a clean-up "job" via an Azure function would be preferable b/c of money reasons. I'll let the project owners weigh in on that.

@tonysurma @MisterJames cc @stevejgordon

tonysurma commented 7 years ago

the effective azure goto person is me.

Thoughts: definitely prefer automated over cleanup.

Given this project I would prefer that we "split the difference" and both have them deleted at time of replacement as well as have an azure function that can do a sweep for orphans. I have several other projects and things get out of synch due to bugs, evolution of code over time, etc. and on one we don't have cleanup automated and I am the one who does it by hand so things don't grow in storage.

Does that help?

oneolddev commented 7 years ago

@tonysurma Thanks that helped me. There are two things:

  1. Fix the code so this doesn't happen again.
  2. Clean up any possible inconsistency.

My mind was overly focused on the latter which is the result instead of the cause. Missed the forest due to the trees ☚ī¸ I'll take a look to see what is needed to fix this problem since it sounds similar to #1961.

@mgmccarthy Always good to hear from you 😄. Didn't mean to sound argumentative.

mgmccarthy commented 7 years ago

@c0g1t8

cc @tonysurma

In EditVolunteerTaskCommandHandler at this line:

// Delete existing attachments
if (message.VolunteerTask.DeleteAttachments.Count > 0)
{    
    var attachmentsToDelete = _context.Attachments.Where(a => a.Task.Id == volunteerTask.Id && message.VolunteerTask.DeleteAttachments.Contains(a.Id)).ToList();
    _context.RemoveRange(attachmentsToDelete);
}

is where the "clean up" takes place when deleting an attachment from a task. Obviously, there is no call out to azure to also delete the blob that is associated with this attachment.

You already have the ITaskAttachmentService injected into this handler, so you have some choices how to take care of the blob cleanup:

  1. call ITaskAttachmentService.DeleteAsync directly in the handler
  2. create a new new MediatR INotification message called TaskAttachmentDeleted, and write a handler, that uses the ITaskAtachmentService to deal with deleting the blob.
  3. use use Hangfire.Enqueue, creating an interface/implementation that does the same thing as #2 (delete the blob).
  4. write a new function using an existing azure queue (or newly added azure queue) that can tap into a blob storage trigger. The function would need access to the blob connection string, and if it can get a hold of that, then it shoudl be a fairly straightforward approach. Send a new message using a yet to be defined interface (similar to ISmsSender/IEmailSender), send a "command" to delete the blob.

the trade-off to each approach:

So, those are the choices I see and the pros and cons associated with the choices (and my list of choices is by no means an exhaustive list, it's just some ideas I've had).

Let me know if you have any questions/comments

oneolddev commented 7 years ago

@mgmccarthy

A great analysis once again. 😄 I'm inclined to go with Option #1. It is the simplest solution that addresses the reported problem.

Here to help and learn 😄

mgmccarthy commented 7 years ago

@c0g1t8,

I agree, Option 1 is the best approach for now.

For option 2, the only decoupling you're getting is isolating behavior for the attachment/azure code to another class. MediatR is still a synchronous pipeline under the covers, so even though there are two classes where there is now one, the whole thing (including the call to blob storage) is synchronous.

For now, we'd like to limit the Hangfire usage as much as we can. Until that framework supports async/await at least.

Since you have Azure experience (probably more than me)... is it possible to run a WebJob in it's own AppService (separate from the AppService that's hosting the web server?) or does a WebJob always have to be "tied" to its hosting AppService?

For example, I know a WebJob is "tied" to a web project in vs if there is a webjobs-list.json file under the Properties for that web project.

If you can host a WebJob in it's own AppService and not interfere with the resources of the web server, then WebJobs aren't neccessarily "bad"... their just "older" than azure functions, and don't embrace the whole idea of "serverless"..

mgmccarthy commented 7 years ago

@c0g1t8 do you want to take this issue and implement option 1?

oneolddev commented 7 years ago

@mgmccarthy I'll take this issue 😄.