biocore / labadmin

Administration website for the Knight Lab
4 stars 16 forks source link

Pulldown temp view #191

Closed sjanssen2 closed 7 years ago

sjanssen2 commented 7 years ago

I know this is a very ugly, duct tape solution, but we need to be able to pull down information NOW. The problem is that the table ag.ag_kit_barcode does not allow to have multiple survey_ids for one barcode. Thus, when the user assigns a barcode to a source, it is actually assigned to the latest survey_id the user created and overwrites already existing survey_id(s). In the long run, we need to correct this erroneous behaviour, but due to upcoming deadlines, this PR should enable us to deliver to the collaborator.

Please review with these special circumstances in mind!

~My question: is it worth to remove the temp view after the actual SQL statement is executed? Or can we relay on the fact that it is updated every execution time?~ Stupid me, it's a temp view

wasade commented 7 years ago

For other reviewers, @sjanssen2 and I put these sets of queries together to both understand the situation as well as to provide a plausible means to get metadata out right now as noted in the PR.

Just to check, has the pulldown been executed against these temporary tables and does it produce metadata across the different surveys for previously impacted barcodes?

sjanssen2 commented 7 years ago

I was able to retrieve more information than before, i.e. for a few barcodes with multiple surveys, we got all rows in the according files of the pull down zip archive, where we previously only got one line. But I haven't had the time to check with @EmbrietteH testset.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.04%) to 93.475% when pulling 46dd4a89d1014cf4bf7f10655d7e22e6c9bf8958 on sjanssen2:pulldown_tmpview into 111625f2823d59f3babd90550102f958c11ed189 on biocore:master.

sjanssen2 commented 7 years ago

I cannot properly test: we currently relay on the tuple of ag_login_id and participant_name to identify the human sources for which more than one survey exists. Since Jose's scrubbing script scrambles those names independently, the test DB does not contain any ag_login_id, participant_name tuples with more than one survey :-(

sjanssen2 commented 7 years ago

ready for reviews @antgonza @wasade @josenavas

wasade commented 7 years ago

Can the scrubbing script be fixed? It shouldn't be changing the logical organization of the data

josenavas commented 7 years ago

Just to provide my input here - of course the scrubbing script can be modified - however I do not have time to do it this week.

sjanssen2 commented 7 years ago

Have you checked with @EmbrietteH if we can afford this delay?

EmbrietteH commented 7 years ago

No, not ok to delay another week. If Jose cannot do it this week due to work on the plate mapper, @sjanssen2 and @wasade please take a look at the scrubbing script and try to resolve.

wasade commented 7 years ago

@sjanssen2, can you do the pulldown now for the set of samples we already know are impacted and which we can verify the metadata for independently so that we can understand if this solves the immediate need? That is more important near term than unit tests within this PR.

@EmbrietteH, the new scrub is extremely complex and neither @sjanssen2 or I are sufficiently versed in SQL for that task. I'd guestimate a full day of effort to understand what needs to change. Unfortunately, I think that @josenavas is best suited here to resolve next week.

Is it feasible to just use the original scrubbing script, and reapply it?

sjanssen2 commented 7 years ago

We decided to not merge my PR #191 and instead only roll this out on our test server. I've done so and started labadmin. Results look promising, but @EmbrietteH needs to confirm the data I sent her via email.