Closed AmandaBirmingham closed 5 years ago
It seems like the intent behind this issue is duplicated in more specific ones; however correct prep sheets for shotgun definitely needs to happen by full launch.
If this issue is duplicated by more specific ones, go ahead and link them here. I am not aware of anyone having done the work to get detailed specifications for the shotgun prep sheet that Gail has signed off on.
Note that code changes for this issue CANNOT commence until we gather specifications from Gail and get them signed off by her and any other stakeholders--otherwise we are wasting our time.
@AmandaBirmingham For the subquery to collect prep information (below), it looks like I can straight replace the '16S' tables with their 'Shotgun' equivalents. However when I do so, I get zero results for sequencing_process_id = 1 (AGP), and even if I remove the 'WHERE' clause. Do you think I'm missing something?
SELECT distinct
libprepplate2.plate_id
FROM
labman.sequencing_process sp
LEFT JOIN
labman.sequencing_process_lanes spl USING ( sequencing_process_id)
LEFT JOIN
labman.pool_composition_components pcc1
ON ( spl.pool_composition_id = pcc1.output_pool_composition_id)
LEFT JOIN
labman.pool_composition pccon
ON ( pcc1.input_composition_id = pccon.composition_id)
LEFT JOIN
labman.pool_composition_components pcc2
ON ( pccon.pool_composition_id = pcc2.output_pool_composition_id)
LEFT JOIN
labman.library_prep_16s_composition libprepcp2
ON ( pcc2.input_composition_id = libprepcp2.composition_id)
LEFT JOIN
labman.composition libprepcpcp2
ON ( libprepcp2.composition_id = libprepcpcp2.composition_id)
LEFT JOIN
labman.library_prep_16s_process libpreppr2
ON ( libprepcpcp2.upstream_process_id = libpreppr2.process_id)
LEFT JOIN
labman.well libprepwell2
ON ( libprepcpcp2.container_id = libprepwell2.container_id)
LEFT JOIN
labman.plate libprepplate2
ON ( libprepwell2.plate_id = libprepplate2.plate_id)
WHERE
sequencing_process_id = 1
I can't quite tell if it's simply that there are no matching plates in the system or whether I have the subquery wrong.
Just an FYI: I believe this is the correct form of the subquery for shotgun:
SELECT distinct libprepplate2.plate_id
FROM labman.sequencing_process sp
LEFT JOIN labman.sequencing_process_lanes spl USING (sequencing_process_id)
LEFT JOIN labman.pool_composition_components pcc1 ON (spl.pool_composition_id = pcc1.output_pool_composition_id)
LEFT JOIN labman.library_prep_shotgun_composition libprepcp2 ON ( pcc1.input_composition_id = libprepcp2.composition_id)
LEFT JOIN labman.composition libprepcpcp2 ON ( libprepcp2.composition_id = libprepcpcp2.composition_id)
LEFT JOIN labman.library_prep_16s_process libpreppr2 ON ( libprepcpcp2.upstream_process_id = libpreppr2.process_id)
LEFT JOIN labman.well libprepwell2 ON ( libprepcpcp2.container_id = libprepwell2.container_id)
LEFT JOIN labman.plate libprepplate2 ON ( libprepwell2.plate_id = libprepplate2.plate_id)
WHERE sequencing_process_id = 4;
I was banging my head against a wall until I stepped through each join and examined the output, reviewed code, etc. The turning point was realizing all results from labman.sequencing_process where sequencing_process_id = 1 were Amplicon assays. :)
Edit: See Amanda's comment below.
FYI, like other 'composition' tables, 'labman.pool_composition' is a table that records compositions that in this case are composed of other pools (duh). Rather than being the table to keep and consult for pooling information, it actually only applies to compositions that are pools of other pools.
'labman.pool_composition' is a table that records compositions that in this case are composed of other pools (duh). Rather than being the table to keep and consult for pooling information, it actually only applies to compositions that are pools of other pools.
No, I'm afraid this is not the case.
Any given record in labman.pool_composition
is associated with a single pool. The "ingredients" that make up that pool are listed in labman.pool_composition_components
, with one record in that table for each ingredient (and with those component records being linked to the labman.pool_composition
record they are a part of by their output_pool_composition_id
).
A pool_composition_component
is basically a generic way to refer to anything that goes into a pool, where that thing is identified by its generic composition_id
in the pool_composition_component
's input_composition_id
column. The kinds of things that can go into a pool (via their generic composition_id
s) are library_prep_shotgun_compositions
, library_prep_16s_compositions
, and/or (in the case of pools made up either wholy or partly of other pools) pool_compositions
.
In the query above in https://github.com/jdereus/labman/issues/327#issuecomment-488171842 , the fact that all pools (including those not made up of other pools) are represented by pool_compositions
may not be clear because the pool_composition
table is not joined to directly. However, it is used implicitly: thelabman.sequencing_process_lanes
's pool_composition_id
is joined to the labman.pool_composition_components
's output_pool_composition_id
field--and both of these fields are foreign keys to the same pool_composition
record for the pool of interest. That pool is made up of pool_composition_components
representing library_prep_shotgun_compositions
(not other pools).
@AmandaBirmingham Thanks so much for the clarification! You are right; in labman.pool_composition_components for example, the input_composition_id is FKey'ed to composition(composition_id) while output_pool_composition_id is FKey'ed to pool_composition(pool_composition_id):
"fk_pool_composition_components" FOREIGN KEY (output_pool_composition_id) REFERENCES pool_composition(pool_composition_id)
"fk_sequencing_pool_components_component_pool" FOREIGN KEY (input_composition_id) REFERENCES composition(composition_id)
As you said, there's no need for a mapping table; the two tables being joined are being joined on a foreign key that they both possess. It seems to me that the above query is still correct in what it should do; does the above query appear correct to you? I can attach copies of the output in the morning; I recall that it does return 384 rows, which seems correct for shotgun?
Not to offend, just added Amanda and myself to the assignees list, since we've been working on this together and separately since Friday.
NB: Since the decisions of our recent meeting mean https://github.com/jdereus/labman/issues/445 is pulled into the 0.1.0 milestone, note that we need to make sure both amplicon and shotgun prep sheet creation performs the check to ensure there is not more than 1 study represented on any plate being included in the prep output (and hopefully implements this check in shared code :) )
@charles-cowart Re https://github.com/jdereus/labman/issues/327#issuecomment-488208480 , yes, I agree the query in that comment produces the correct output. My comment was only to clarify how it does so :)
I believe the only outstanding stakeholder questions outstanding on this one are about whether to include the gdna extraction/plating/etc detail columns that are included in the amplicon prep sheet but have not historically been included in the shotgun prep sheet (even though they are relevant to shotgun preps). Per discussion on 04/26/2019 with Gail after the group meeting, my understanding is that putting these columns into shotgun prep sheets will NOT cause any qiita processing problems. Therefore, unless someone overrules me, I am going to make an executive decision that we WILL include those columns (so that amplicon and shotgun prep sheets share all columns that are relevant to both).
In the long run, however, I think that the Qiita team (@antgonza et al) should consider whether to keep pulling this processing info into Qiita via prep sheets after LabControl goes into use. Pretty much the entire point of LabControl is to store all this processing info in a database, and so exporting it from there and importing it into a different database (or at least, a different location in the same database) duplicates will be duplicating data in a distressing way. Just something to consider for the future!
Hmm, that would be a nice thing to have...:)
On Thu, May 2, 2019 at 4:59 PM Amanda Birmingham notifications@github.com wrote:
I believe the only outstanding stakeholder questions outstanding on this one are about whether to include the gdna extraction/plating/etc detail columns that are included in the amplicon prep sheet but have not historically been included in the shotgun prep sheet (even though they are relevant to shotgun preps). Per discussion on 04/26/2019 with Gail after the group meeting, my understanding is that putting these columns into shotgun prep sheets will NOT cause any qiita processing problems. Therefore, unless someone overrules me, I am going to make an executive decision that we WILL include those columns (so that amplicon and shotgun prep sheets share all columns that are relevant to both).
In the long run, however, I think that the Qiita team (@antgonza https://github.com/antgonza et al) should consider whether to keep pulling this processing info into Qiita via prep sheets after LabControl goes into use. Pretty much the entire point of LabControl is to store all this processing info in a database, and so exporting it from there and importing it into a different database (or at least, a different location in the same database) duplicates will be duplicating data in a distressing way. Just something to consider for the future!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jdereus/labman/issues/327#issuecomment-488873371, or mute the thread https://github.com/notifications/unsubscribe-auth/AKFU7EYK6Y6XOVWTGCU5WFDPTN54TANCNFSM4FQC32QQ .
:D Thanks so much for doing so! It was a big help.
On Thu, May 2, 2019 at 4:51 PM Amanda Birmingham notifications@github.com wrote:
@charles-cowart https://github.com/charles-cowart Re #327 (comment) https://github.com/jdereus/labman/issues/327#issuecomment-488208480 , yes, I agree the query in that comment produces the correct output. My comment was only to clarify how it does so :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jdereus/labman/issues/327#issuecomment-488872034, or mute the thread https://github.com/notifications/unsubscribe-auth/AKFU7E3ZAO7RAERWHQJEBVTPTN46JANCNFSM4FQC32QQ .
I believe this is largely correct. I don't believe there should be different primer_set_ids, and different primer_dates for i5 and i7, but I've included them here for confirmation as they're easy to remove; the data for each pair of columns are identical.
SELECT study.study_id, study_sample.sample_id, study.study_alias AS project_name, study_sample.sample_id AS orig_name, study.study_description AS experiment_design_description, samplewell.row_num AS row_num, samplewell.col_num AS col_num, samplecp.content, sampleplate.external_id AS sample_plate, platingprpr.run_personnel_id AS plating, gdnaextractpr.extraction_kit_id, gdnaextractpr.epmotion_robot_id AS gepmotion_robot_id, gdnaextractpr.epmotion_tool_id, gdnaextractpr.kingfisher_robot_id, libpreppr.kappa_hyper_plus_kit_id, libpreppr.stub_lot_id, libpreppr.normalization_process_id, primersetcp.barcode_seq AS barcode_i5, primersetcp2.barcode_seq AS barcode_i7, primersetcp.primer_set_id AS primer_set_id_i5, primersetcp2.primer_set_id AS primer_set_id_i7, primersetplate.external_id AS primer_plate_i5, primersetplate2.external_id AS primer_plate_i7, primerworkingplateprpr.run_date AS primer_date_i5, primerworkingplateprpr2.run_date AS primer_date_i7 FROM labman.plate libprepplate LEFT JOIN labman.well libprepwell ON (libprepplate.plate_id = libprepwell.plate_id) LEFT JOIN labman.composition libprepcpcp ON (libprepwell.container_id = libprepcpcp.container_id) LEFT JOIN labman.library_prep_shotgun_process libpreppr ON (libprepcpcp.upstream_process_id = libpreppr.process_id) LEFT JOIN labman.library_prep_shotgun_composition libprepcp ON ( libprepcpcp.composition_id = libprepcp.composition_id) LEFT JOIN labman.normalized_gdna_composition b ON ( libprepcp.normalized_gdna_composition_id = b.normalized_gdna_composition_id) LEFT JOIN labman.compressed_gdna_composition c ON ( b.compressed_gdna_composition_id = c.compressed_gdna_composition_id) LEFT JOIN labman.gdna_composition gdnacp USING (gdna_composition_id) LEFT JOIN labman.composition gdnacpcp ON (gdnacp.composition_id = gdnacpcp.composition_id) LEFT JOIN labman.gdna_extraction_process gdnaextractpr ON (gdnacpcp.upstream_process_id = gdnaextractpr.process_id) LEFT JOIN labman.sample_composition samplecp USING (sample_composition_id) LEFT JOIN labman.composition samplecpcp ON (samplecp.composition_id = samplecpcp.composition_id) LEFT JOIN labman.well samplewell ON (samplecpcp.container_id = samplewell.container_id) LEFT JOIN labman.plate sampleplate ON (samplewell.plate_id = sampleplate.plate_id) LEFT JOIN labman.process platingprpr ON (samplecpcp.upstream_process_id = platingprpr.process_id) LEFT JOIN labman.primer_composition primercp ON (libprepcp.i5_primer_composition_id = primercp.primer_composition_id) LEFT JOIN labman.primer_composition primercp2 ON (libprepcp.i7_primer_composition_id = primercp2.primer_composition_id) LEFT JOIN labman.composition primercpcp on (primercp.composition_id = primercpcp.composition_id) LEFT JOIN labman.composition primercpcp2 on (primercp2.composition_id = primercpcp2.composition_id) LEFT JOIN labman.process primerworkingplateprpr ON (primercpcp.upstream_process_id = primerworkingplateprpr.process_id) LEFT JOIN labman.process primerworkingplateprpr2 ON (primercpcp2.upstream_process_id = primerworkingplateprpr2.process_id) LEFT JOIN labman.primer_set_composition primersetcp ON (primercp.primer_set_composition_id = primersetcp.primer_set_composition_id) LEFT JOIN labman.primer_set_composition primersetcp2 ON (primercp2.primer_set_composition_id = primersetcp2.primer_set_composition_id) LEFT JOIN labman.composition primersetcpcp ON (primersetcp.composition_id = primersetcpcp.composition_id) LEFT JOIN labman.composition primersetcpcp2 ON (primersetcp2.composition_id = primersetcpcp2.composition_id) LEFT JOIN labman.well primersetwell ON (primersetcpcp.container_id = primersetwell.container_id) LEFT JOIN labman.well primersetwell2 ON (primersetcpcp2.container_id = primersetwell2.container_id) LEFT JOIN labman.plate primersetplate ON (primersetwell.plate_id = primersetplate.plate_id) LEFT JOIN labman.plate primersetplate2 ON (primersetwell2.plate_id = primersetplate2.plate_id) FULL JOIN qiita.study_sample USING (sample_id) LEFT JOIN qiita.study as study USING (study_id) WHERE libprepplate.plate_id IN ( SELECT distinct libprepplate2.plate_id FROM labman.sequencing_process sp LEFT JOIN labman.sequencing_process_lanes spl USING (sequencing_process_id) LEFT JOIN labman.pool_composition_components pcc1 ON (spl.pool_composition_id = pcc1.output_pool_composition_id) LEFT JOIN labman.library_prep_shotgun_composition libprepcp2 ON (pcc1.input_composition_id = libprepcp2.composition_id) LEFT JOIN labman.composition libprepcpcp2 ON (libprepcp2.composition_id = libprepcpcp2.composition_id) LEFT JOIN labman.library_prep_shotgun_process libpreppr2 ON (libprepcpcp2.upstream_process_id = libpreppr2.process_id) LEFT JOIN labman.well libprepwell2 ON (libprepcpcp2.container_id = libprepwell2.container_id) LEFT JOIN labman.plate libprepplate2 ON (libprepwell2.plate_id = libprepplate2.plate_id) WHERE sequencing_process_id = 4 );
Going to start fleshing out the shotgun prep file generation method. @AmandaBirmingham, if it's no trouble, would you mind reviewing the query and/or output? Thanks very much!
WIP PR for resolving this issue is now available: https://github.com/jdereus/labman/pull/502
Roping https://github.com/jdereus/labman/issues/431 into this issue, as we need to solve it for shotgun as well.
@charles-cowart, can this be closed now that #502 is merged?
While Labman currently includes code to generate a Preparation Sheet (aka prep info file) for shotgun runs, those files do not follow the specifications required by Gail, who currently prepares the manually with input from the sample sheet.
The scope of this issue includes working with Gail to get a known-good shotgun prep info file and descriptions of what goes in each column, then modifying the labman code to produce a file with this content/format and reinstating the ability to download this file from the labman interface (i.e., reversing the fix for #326 Disable Preparation Sheet download for shotgun sequencing runs ). Since numerous fields are shared between the amplicon prep info file and the shotgun prep info file, code duplication should be minimized.