biocore / LabControl

lab manager for plate maps and sequence flows
BSD 3-Clause "New" or "Revised" License
2 stars 15 forks source link

Generate correct Preparation Sheet for shotgun runs #327

Closed AmandaBirmingham closed 5 years ago

AmandaBirmingham commented 5 years ago

While Labman currently includes code to generate a Preparation Sheet (aka prep info file) for shotgun runs, those files do not follow the specifications required by Gail, who currently prepares the manually with input from the sample sheet.

The scope of this issue includes working with Gail to get a known-good shotgun prep info file and descriptions of what goes in each column, then modifying the labman code to produce a file with this content/format and reinstating the ability to download this file from the labman interface (i.e., reversing the fix for #326 Disable Preparation Sheet download for shotgun sequencing runs ). Since numerous fields are shared between the amplicon prep info file and the shotgun prep info file, code duplication should be minimized.

charles-cowart commented 5 years ago

It seems like the intent behind this issue is duplicated in more specific ones; however correct prep sheets for shotgun definitely needs to happen by full launch.

AmandaBirmingham commented 5 years ago

If this issue is duplicated by more specific ones, go ahead and link them here. I am not aware of anyone having done the work to get detailed specifications for the shotgun prep sheet that Gail has signed off on.

AmandaBirmingham commented 5 years ago

Note that code changes for this issue CANNOT commence until we gather specifications from Gail and get them signed off by her and any other stakeholders--otherwise we are wasting our time.

charles-cowart commented 5 years ago

@AmandaBirmingham For the subquery to collect prep information (below), it looks like I can straight replace the '16S' tables with their 'Shotgun' equivalents. However when I do so, I get zero results for sequencing_process_id = 1 (AGP), and even if I remove the 'WHERE' clause. Do you think I'm missing something?

SELECT distinct
         libprepplate2.plate_id 
      FROM
         labman.sequencing_process sp 
         LEFT JOIN
            labman.sequencing_process_lanes spl USING ( sequencing_process_id) 
         LEFT JOIN
            labman.pool_composition_components pcc1 
            ON ( spl.pool_composition_id = pcc1.output_pool_composition_id) 
         LEFT JOIN
            labman.pool_composition pccon 
            ON ( pcc1.input_composition_id = pccon.composition_id) 
         LEFT JOIN
            labman.pool_composition_components pcc2 
            ON ( pccon.pool_composition_id = pcc2.output_pool_composition_id) 
         LEFT JOIN
            labman.library_prep_16s_composition libprepcp2 
            ON ( pcc2.input_composition_id = libprepcp2.composition_id) 
         LEFT JOIN
            labman.composition libprepcpcp2 
            ON ( libprepcp2.composition_id = libprepcpcp2.composition_id) 
         LEFT JOIN
            labman.library_prep_16s_process libpreppr2 
            ON ( libprepcpcp2.upstream_process_id = libpreppr2.process_id) 
         LEFT JOIN
            labman.well libprepwell2 
            ON ( libprepcpcp2.container_id = libprepwell2.container_id) 
         LEFT JOIN
            labman.plate libprepplate2 
            ON ( libprepwell2.plate_id = libprepplate2.plate_id) 
      WHERE
         sequencing_process_id = 1

I can't quite tell if it's simply that there are no matching plates in the system or whether I have the subquery wrong.

charles-cowart commented 5 years ago

Just an FYI: I believe this is the correct form of the subquery for shotgun:

SELECT distinct libprepplate2.plate_id 
FROM labman.sequencing_process sp
LEFT JOIN labman.sequencing_process_lanes spl USING (sequencing_process_id)
LEFT JOIN labman.pool_composition_components pcc1 ON (spl.pool_composition_id = pcc1.output_pool_composition_id)
LEFT JOIN labman.library_prep_shotgun_composition libprepcp2 ON ( pcc1.input_composition_id = libprepcp2.composition_id) 
LEFT JOIN labman.composition libprepcpcp2 ON ( libprepcp2.composition_id = libprepcpcp2.composition_id) 
LEFT JOIN labman.library_prep_16s_process libpreppr2 ON ( libprepcpcp2.upstream_process_id = libpreppr2.process_id) 
LEFT JOIN labman.well libprepwell2 ON ( libprepcpcp2.container_id = libprepwell2.container_id) 
LEFT JOIN labman.plate libprepplate2 ON ( libprepwell2.plate_id = libprepplate2.plate_id) 
WHERE sequencing_process_id = 4;

I was banging my head against a wall until I stepped through each join and examined the output, reviewed code, etc. The turning point was realizing all results from labman.sequencing_process where sequencing_process_id = 1 were Amplicon assays. :)

charles-cowart commented 5 years ago

Edit: See Amanda's comment below.

FYI, like other 'composition' tables, 'labman.pool_composition' is a table that records compositions that in this case are composed of other pools (duh). Rather than being the table to keep and consult for pooling information, it actually only applies to compositions that are pools of other pools.

AmandaBirmingham commented 5 years ago

'labman.pool_composition' is a table that records compositions that in this case are composed of other pools (duh). Rather than being the table to keep and consult for pooling information, it actually only applies to compositions that are pools of other pools.

No, I'm afraid this is not the case.

Any given record in labman.pool_composition is associated with a single pool. The "ingredients" that make up that pool are listed in labman.pool_composition_components, with one record in that table for each ingredient (and with those component records being linked to the labman.pool_composition record they are a part of by their output_pool_composition_id).

A pool_composition_component is basically a generic way to refer to anything that goes into a pool, where that thing is identified by its generic composition_id in the pool_composition_component's input_composition_id column. The kinds of things that can go into a pool (via their generic composition_ids) are library_prep_shotgun_compositions, library_prep_16s_compositions, and/or (in the case of pools made up either wholy or partly of other pools) pool_compositions.

In the query above in https://github.com/jdereus/labman/issues/327#issuecomment-488171842 , the fact that all pools (including those not made up of other pools) are represented by pool_compositions may not be clear because the pool_composition table is not joined to directly. However, it is used implicitly: thelabman.sequencing_process_lanes's pool_composition_id is joined to the labman.pool_composition_components's output_pool_composition_id field--and both of these fields are foreign keys to the same pool_composition record for the pool of interest. That pool is made up of pool_composition_components representing library_prep_shotgun_compositions (not other pools).

charles-cowart commented 5 years ago

@AmandaBirmingham Thanks so much for the clarification! You are right; in labman.pool_composition_components for example, the input_composition_id is FKey'ed to composition(composition_id) while output_pool_composition_id is FKey'ed to pool_composition(pool_composition_id):

    "fk_pool_composition_components" FOREIGN KEY (output_pool_composition_id) REFERENCES pool_composition(pool_composition_id)
    "fk_sequencing_pool_components_component_pool" FOREIGN KEY (input_composition_id) REFERENCES composition(composition_id)

As you said, there's no need for a mapping table; the two tables being joined are being joined on a foreign key that they both possess. It seems to me that the above query is still correct in what it should do; does the above query appear correct to you? I can attach copies of the output in the morning; I recall that it does return 384 rows, which seems correct for shotgun?

charles-cowart commented 5 years ago

Not to offend, just added Amanda and myself to the assignees list, since we've been working on this together and separately since Friday.

AmandaBirmingham commented 5 years ago

NB: Since the decisions of our recent meeting mean https://github.com/jdereus/labman/issues/445 is pulled into the 0.1.0 milestone, note that we need to make sure both amplicon and shotgun prep sheet creation performs the check to ensure there is not more than 1 study represented on any plate being included in the prep output (and hopefully implements this check in shared code :) )

AmandaBirmingham commented 5 years ago

@charles-cowart Re https://github.com/jdereus/labman/issues/327#issuecomment-488208480 , yes, I agree the query in that comment produces the correct output. My comment was only to clarify how it does so :)

AmandaBirmingham commented 5 years ago

I believe the only outstanding stakeholder questions outstanding on this one are about whether to include the gdna extraction/plating/etc detail columns that are included in the amplicon prep sheet but have not historically been included in the shotgun prep sheet (even though they are relevant to shotgun preps). Per discussion on 04/26/2019 with Gail after the group meeting, my understanding is that putting these columns into shotgun prep sheets will NOT cause any qiita processing problems. Therefore, unless someone overrules me, I am going to make an executive decision that we WILL include those columns (so that amplicon and shotgun prep sheets share all columns that are relevant to both).

In the long run, however, I think that the Qiita team (@antgonza et al) should consider whether to keep pulling this processing info into Qiita via prep sheets after LabControl goes into use. Pretty much the entire point of LabControl is to store all this processing info in a database, and so exporting it from there and importing it into a different database (or at least, a different location in the same database) duplicates will be duplicating data in a distressing way. Just something to consider for the future!

charles-cowart commented 5 years ago

Hmm, that would be a nice thing to have...:)

On Thu, May 2, 2019 at 4:59 PM Amanda Birmingham notifications@github.com wrote:

I believe the only outstanding stakeholder questions outstanding on this one are about whether to include the gdna extraction/plating/etc detail columns that are included in the amplicon prep sheet but have not historically been included in the shotgun prep sheet (even though they are relevant to shotgun preps). Per discussion on 04/26/2019 with Gail after the group meeting, my understanding is that putting these columns into shotgun prep sheets will NOT cause any qiita processing problems. Therefore, unless someone overrules me, I am going to make an executive decision that we WILL include those columns (so that amplicon and shotgun prep sheets share all columns that are relevant to both).

In the long run, however, I think that the Qiita team (@antgonza https://github.com/antgonza et al) should consider whether to keep pulling this processing info into Qiita via prep sheets after LabControl goes into use. Pretty much the entire point of LabControl is to store all this processing info in a database, and so exporting it from there and importing it into a different database (or at least, a different location in the same database) duplicates will be duplicating data in a distressing way. Just something to consider for the future!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jdereus/labman/issues/327#issuecomment-488873371, or mute the thread https://github.com/notifications/unsubscribe-auth/AKFU7EYK6Y6XOVWTGCU5WFDPTN54TANCNFSM4FQC32QQ .

charles-cowart commented 5 years ago

:D Thanks so much for doing so! It was a big help.

On Thu, May 2, 2019 at 4:51 PM Amanda Birmingham notifications@github.com wrote:

@charles-cowart https://github.com/charles-cowart Re #327 (comment) https://github.com/jdereus/labman/issues/327#issuecomment-488208480 , yes, I agree the query in that comment produces the correct output. My comment was only to clarify how it does so :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jdereus/labman/issues/327#issuecomment-488872034, or mute the thread https://github.com/notifications/unsubscribe-auth/AKFU7E3ZAO7RAERWHQJEBVTPTN46JANCNFSM4FQC32QQ .

charles-cowart commented 5 years ago

I believe this is largely correct. I don't believe there should be different primer_set_ids, and different primer_dates for i5 and i7, but I've included them here for confirmation as they're easy to remove; the data for each pair of columns are identical.

SELECT study.study_id, study_sample.sample_id, study.study_alias AS project_name, study_sample.sample_id AS orig_name, study.study_description AS experiment_design_description, samplewell.row_num AS row_num, samplewell.col_num AS col_num, samplecp.content, sampleplate.external_id AS sample_plate, platingprpr.run_personnel_id AS plating, gdnaextractpr.extraction_kit_id, gdnaextractpr.epmotion_robot_id AS gepmotion_robot_id, gdnaextractpr.epmotion_tool_id, gdnaextractpr.kingfisher_robot_id, libpreppr.kappa_hyper_plus_kit_id, libpreppr.stub_lot_id, libpreppr.normalization_process_id, primersetcp.barcode_seq AS barcode_i5, primersetcp2.barcode_seq AS barcode_i7, primersetcp.primer_set_id AS primer_set_id_i5, primersetcp2.primer_set_id AS primer_set_id_i7, primersetplate.external_id AS primer_plate_i5, primersetplate2.external_id AS primer_plate_i7, primerworkingplateprpr.run_date AS primer_date_i5, primerworkingplateprpr2.run_date AS primer_date_i7 FROM labman.plate libprepplate LEFT JOIN labman.well libprepwell ON (libprepplate.plate_id = libprepwell.plate_id) LEFT JOIN labman.composition libprepcpcp ON (libprepwell.container_id = libprepcpcp.container_id) LEFT JOIN labman.library_prep_shotgun_process libpreppr ON (libprepcpcp.upstream_process_id = libpreppr.process_id) LEFT JOIN labman.library_prep_shotgun_composition libprepcp ON ( libprepcpcp.composition_id = libprepcp.composition_id) LEFT JOIN labman.normalized_gdna_composition b ON ( libprepcp.normalized_gdna_composition_id = b.normalized_gdna_composition_id) LEFT JOIN labman.compressed_gdna_composition c ON ( b.compressed_gdna_composition_id = c.compressed_gdna_composition_id) LEFT JOIN labman.gdna_composition gdnacp USING (gdna_composition_id) LEFT JOIN labman.composition gdnacpcp ON (gdnacp.composition_id = gdnacpcp.composition_id) LEFT JOIN labman.gdna_extraction_process gdnaextractpr ON (gdnacpcp.upstream_process_id = gdnaextractpr.process_id) LEFT JOIN labman.sample_composition samplecp USING (sample_composition_id) LEFT JOIN labman.composition samplecpcp ON (samplecp.composition_id = samplecpcp.composition_id) LEFT JOIN labman.well samplewell ON (samplecpcp.container_id = samplewell.container_id) LEFT JOIN labman.plate sampleplate ON (samplewell.plate_id = sampleplate.plate_id) LEFT JOIN labman.process platingprpr ON (samplecpcp.upstream_process_id = platingprpr.process_id) LEFT JOIN labman.primer_composition primercp ON (libprepcp.i5_primer_composition_id = primercp.primer_composition_id) LEFT JOIN labman.primer_composition primercp2 ON (libprepcp.i7_primer_composition_id = primercp2.primer_composition_id) LEFT JOIN labman.composition primercpcp on (primercp.composition_id = primercpcp.composition_id) LEFT JOIN labman.composition primercpcp2 on (primercp2.composition_id = primercpcp2.composition_id) LEFT JOIN labman.process primerworkingplateprpr ON (primercpcp.upstream_process_id = primerworkingplateprpr.process_id) LEFT JOIN labman.process primerworkingplateprpr2 ON (primercpcp2.upstream_process_id = primerworkingplateprpr2.process_id) LEFT JOIN labman.primer_set_composition primersetcp ON (primercp.primer_set_composition_id = primersetcp.primer_set_composition_id) LEFT JOIN labman.primer_set_composition primersetcp2 ON (primercp2.primer_set_composition_id = primersetcp2.primer_set_composition_id) LEFT JOIN labman.composition primersetcpcp ON (primersetcp.composition_id = primersetcpcp.composition_id) LEFT JOIN labman.composition primersetcpcp2 ON (primersetcp2.composition_id = primersetcpcp2.composition_id) LEFT JOIN labman.well primersetwell ON (primersetcpcp.container_id = primersetwell.container_id) LEFT JOIN labman.well primersetwell2 ON (primersetcpcp2.container_id = primersetwell2.container_id) LEFT JOIN labman.plate primersetplate ON (primersetwell.plate_id = primersetplate.plate_id) LEFT JOIN labman.plate primersetplate2 ON (primersetwell2.plate_id = primersetplate2.plate_id) FULL JOIN qiita.study_sample USING (sample_id) LEFT JOIN qiita.study as study USING (study_id) WHERE libprepplate.plate_id IN ( SELECT distinct libprepplate2.plate_id FROM labman.sequencing_process sp LEFT JOIN labman.sequencing_process_lanes spl USING (sequencing_process_id) LEFT JOIN labman.pool_composition_components pcc1 ON (spl.pool_composition_id = pcc1.output_pool_composition_id) LEFT JOIN labman.library_prep_shotgun_composition libprepcp2 ON (pcc1.input_composition_id = libprepcp2.composition_id) LEFT JOIN labman.composition libprepcpcp2 ON (libprepcp2.composition_id = libprepcpcp2.composition_id) LEFT JOIN labman.library_prep_shotgun_process libpreppr2 ON (libprepcpcp2.upstream_process_id = libpreppr2.process_id) LEFT JOIN labman.well libprepwell2 ON (libprepcpcp2.container_id = libprepwell2.container_id) LEFT JOIN labman.plate libprepplate2 ON (libprepwell2.plate_id = libprepplate2.plate_id) WHERE sequencing_process_id = 4 );

sample_output.txt

Going to start fleshing out the shotgun prep file generation method. @AmandaBirmingham, if it's no trouble, would you mind reviewing the query and/or output? Thanks very much!

charles-cowart commented 5 years ago

WIP PR for resolving this issue is now available: https://github.com/jdereus/labman/pull/502

Roping https://github.com/jdereus/labman/issues/431 into this issue, as we need to solve it for shotgun as well.

wasade commented 5 years ago

@charles-cowart, can this be closed now that #502 is merged?