DataFile models should have a method making their own app_task_params [3]

madprime commented 9 years ago

Instead of leaving this to the signal or view, which creates redundant code that's prone to breaking later (e.g. by forgetting to update both locations with a new parameter).

Instead we want should be done in one way that both views and signals can use. This is relevant to #79, #80, and #81 and their probable usage of views to start tasks (using data_import.DataRetrievalView). Assuming this is <3 hours work, I think it should be prioritized above those in order to avoid knowingly creating code we don't like?

Labeling this as a question pending @beaugunderson's review. The underlying issue is having a single way of defining app_task_params. I think that means doing it in the model, but if that's not right feel free to change the issue's title.

beaugunderson commented 9 years ago

I agree re: putting this code in one place; and the DataFile model does seem a natural place for it. :)

madprime commented 9 years ago

For 23andme, I think the post method in DataRetrievalView in studies.twentythree_and_me.views should store the profile_id in a model which is related to UserData (as with pgp's huIDs, etc).

I'm not sure DataFile is the right place for this method. I envisioned the method as knowing, based on existing db data, what the parameters should be (not that they'd be passed to it, that seems to defeat the purpose?). But it's not clear DataFile can do that because studies are holding app-specific task data as many-to-one relationships to the UserData.

So, should such a method live in these other models, and not in DataFile?

(Or, we could have all tasks accept receiving arrays of parameters. American Gut's data processing accepts an array of barcodes, and creates a file for each one. But I don't think that'd work for a signal when saving a single Barcode, the DataFile would fire for all barcodes it sees.)

beaugunderson commented 9 years ago

On my branch I've changed things so that the DataRetrievalView is just responsible for answering the question 'what is all of the linked data for this user?' so for example, 'what are all of this user's barcodes?'

Then the BaseRetrievalView just uses that to call the DataFile's format_params method.

The signal handler only runs on the value that was just added, and similarly calls the DataFile's format_params method.

I'm kind of half-happy with it but it still feels a little hacky.

madprime commented 9 years ago

I've thought about this and concluded the issue should be moved to backburner. It's complicated, and there's a strong implication that American Gut's site would need to change.

I'm dumping my thoughts here for future reference. It's impossible for me to write this much without being wrong about something, but the big picture will probably still be "we shouldn't worry about this until other priorities are completed".

(1) I think it would make more sense for a "create parameters for data retrieval task" method to live in UserData (where it could be predictably found for a study/activity app). DataFiles come from tasks, so defining it there seems backwards (possible, but I think it would be a classmethod).

(2) I think of a "task" as being an event that says "go get all the data we can for this member, using all the current IDs/tokens/links we have for this activity/study". That is to say, we take a holistic bundle of data needed to make "data files", and send that as a task. Data processing translates this into files we think represent a user-intuitive bundling of data. (User = member or researcher.)

(3) I think of the study API as providing a user experience that mimics how it works for an activity. The user performs a single action that pushes a set of "permissions" to Open Humans.

(4) This isn't how it currently works, notably for American Gut. AIUI each click the user makes on a barcode is performing a separate, new request that updates our "permissions" by adding/removing a barcode. Open Humans isn't receiving a batch update, it has a bunch of granular events updating the permissions. We can't do a holistic task in response to the API updates because we don't know when to trigger it.

(5) A UI comment on that -- without having experienced it, I think this may feel weird for the user. The UI I'd expect is to update links as a batch (e.g. a "Update barcode links" button). Like... I expect the UX to be more like an email, and less like a Google Doc both people are watching.

(5) But granularity in the API itself might be essentially unavoidable, the API won't bundle updates. And even if we changed the barcode model to be a list, other updates using the API may require multiple requests to update different models before they're done pushing a full update.

(6) A solution might be a flag on API updates that says "this request is part of a batch, there's more coming"? For example, American Gut's page has a "update barcode sharing" button and then a batch of barcode requests are sent – and all but the final one has this flag on it. (Does that flag already exist? django-rest-framework describes "partial" updates, but I think that's likely referring to a different scenario.) Then we could override an appropriate method in the API view to trigger the data retrieval task provided it's not marked as "not final".

OpenHumans / open-humans

DataFile models should have a method making their own app_task_params [3] #92