Open dchristensen opened 7 years ago
@dchristensen I think this would be a great candidate for an Azure Function đ.
@tonysurma Whose the Azure go to person? I wouldn't mind taking a crack at this.
@c0g1t8 I think if we handle the logic in the app correctly and delete the attachments from Blob Storage when they are removed or replaced, like we do with the Org/Campaign/Event images, we could avoid the additional complexity of an Azure function.
@dchristensen I agree that fixing the code would be the best approach. I was thinking of creating an Azure function as a clean up mechanism for production data. I've been on projects where data inconsistencies have been introduced by bugs.
In those cases, I've written applications that do the clean up. Some have evolved into data consistency applications that would be run on a schedule. It would run in report only mode by default but could be set to report and and clean up.
I thought this strategy could be applicable. Given that the attachments are in Azure, I thought an Azure Function would be applicable.
@c0g1t8 @dchristensen
IMHO, we should not be using Azure functions like this. Using them to cleanup something that can be taken care of in the code (which is testable and compile time safe in the IDE) is the way to go. Not that Azure functions are NOT testable, but you don't need an Azure account as a developer in order to do this work in the IDE.
The transactional consistency of this operation can be broken into two handlers. What I mean by this is one transaction can handle removing the reference to the task attachment, then we can publish a MediatR event which can be handled. That handler should take care of the blob cleanup.
@mgmccarthy
First, I fully agree with you and @dchristensen that fixing the problem at origin is always the preferable approach.
The original issue reported was
When a Task Attachment is replaced or removed from a task the reference to the attachment is removed from the DB but the actual attachment data is not removed from the Storage Provider.
I wanted to suggest the use of Azure functions as a clean up mechanism for production data. I prefer automated versus manual clean up.
you don't need an Azure account as a developer
I've been using Azure for a number of years đ. I asked about the Azure go to person because I suspected that there was a Plan and wanted to connect.
@c0g1t8,
I'm not too sure there is a "Azure go to person" on the team currently.
That being said, I thikk this decision depends on what needs to be transactionally consistent from the projects point of view. If the project owners want the blob cleaned up at the point of being replaced or removed, then we can do that in code. Furthermore, we can handle the deletion of the blob async from the sql server work. All w/out introducing an Azure function.
I'm not too sure how azure blobs are billed... is it frequency of accessing the blob, calls to the storage over a given timespan or storage?
If it's storage, then optimizing for the quickest cleanup instead of using a clean-up "job" via an Azure function would be preferable b/c of money reasons. I'll let the project owners weigh in on that.
@tonysurma @MisterJames cc @stevejgordon
the effective azure goto person is me.
Thoughts: definitely prefer automated over cleanup.
Given this project I would prefer that we "split the difference" and both have them deleted at time of replacement as well as have an azure function that can do a sweep for orphans. I have several other projects and things get out of synch due to bugs, evolution of code over time, etc. and on one we don't have cleanup automated and I am the one who does it by hand so things don't grow in storage.
Does that help?
@tonysurma Thanks that helped me. There are two things:
My mind was overly focused on the latter which is the result instead of the cause. Missed the forest due to the trees âšī¸ I'll take a look to see what is needed to fix this problem since it sounds similar to #1961.
@mgmccarthy Always good to hear from you đ. Didn't mean to sound argumentative.
@c0g1t8
cc @tonysurma
In EditVolunteerTaskCommandHandler
at this line:
// Delete existing attachments
if (message.VolunteerTask.DeleteAttachments.Count > 0)
{
var attachmentsToDelete = _context.Attachments.Where(a => a.Task.Id == volunteerTask.Id && message.VolunteerTask.DeleteAttachments.Contains(a.Id)).ToList();
_context.RemoveRange(attachmentsToDelete);
}
is where the "clean up" takes place when deleting an attachment from a task. Obviously, there is no call out to azure to also delete the blob that is associated with this attachment.
You already have the ITaskAttachmentService injected into this handler, so you have some choices how to take care of the blob cleanup:
the trade-off to each approach:
#1
and #2
are synchronous. So if azure blob storage happens to be down, then any code invoking the EditVolunteerTaskCommandHandler
will fail, which would in turn, make it impossible to create and/or edit a VolunteerTask
in the system. Since the blob storage (IMHO) is not something that needs to be transactionally consistent with the work the EditVolunteerTaskCommandHandler, I see this as not the best solution, but it's a step in the right direction.#3
is asynchronous. So if azure blob storage is down, no big deal. any type of azure blob failure will not interfere with the create/edit processing of VolunteerTask
and b/c Hangfire has build-in retries, even if the service is not available immediately (or there is a transient error), that enqueued message will eventually be processed and the blob will be deleted. Even if that message ends up in the Hangfire dashboard's failed messages area, the message can be retried later by hand via the dashboard. The bad part about Hangfire is it still does not support and async interface, which means there could be potential for deadlocks.#4
is async as well and is the preferable approach. This involves potentially creating a new queue (if we can't use an existing one) and a new command class (that would live in \Common\AllReady.Core\Notifications) and a new interface with an implementation similar to the methods in \WebJobs\AllReady.NotificationsWebJob\Function.cs. I'm not too sure what would be in involved in creating a new queue, but that's something @tonysurma might be able to help with or give us some advice on.So, those are the choices I see and the pros and cons associated with the choices (and my list of choices is by no means an exhaustive list, it's just some ideas I've had).
Let me know if you have any questions/comments
@mgmccarthy
A great analysis once again. đ I'm inclined to go with Option #1
. It is the simplest solution that addresses the reported problem.
TaskAttachmentAdded
message should also be created. This would be additional work and I don't see the additional benefit of further decoupling. Or am I missing something?Here to help and learn đ
@c0g1t8,
I agree, Option 1 is the best approach for now.
For option 2, the only decoupling you're getting is isolating behavior for the attachment/azure code to another class. MediatR is still a synchronous pipeline under the covers, so even though there are two classes where there is now one, the whole thing (including the call to blob storage) is synchronous.
For now, we'd like to limit the Hangfire usage as much as we can. Until that framework supports async/await at least.
Since you have Azure experience (probably more than me)... is it possible to run a WebJob in it's own AppService (separate from the AppService that's hosting the web server?) or does a WebJob always have to be "tied" to its hosting AppService?
For example, I know a WebJob is "tied" to a web project in vs if there is a webjobs-list.json file under the Properties for that web project.
If you can host a WebJob in it's own AppService and not interfere with the resources of the web server, then WebJobs aren't neccessarily "bad"... their just "older" than azure functions, and don't embrace the whole idea of "serverless"..
@c0g1t8 do you want to take this issue and implement option 1?
@mgmccarthy I'll take this issue đ.
When a Task Attachment is replaced or removed from a task the reference to the attachment is removed from the DB but the actual attachment data is not removed from the Storage Provider. While not critical, this will leave us with a bunch of orphaned attachments taking up space in Azure.