cnobles / iGUIDE

Bioinformatic pipeline for identifying dsDNA breaks by marker based incorporation, such as breaks induced by designer nucleases like Cas9.
https://iguide.readthedocs.io/en/latest/
GNU General Public License v3.0
20 stars 9 forks source link

Alternative UMI prep and analysis #64

Closed JoeEmendo closed 4 years ago

JoeEmendo commented 4 years ago

There is an alternative UMI protocol for the original GUIDEseq protocol, github here. Might you be willing to create a fork for your analysis pipeline to accept these files are input? Thanks!

cnobles commented 4 years ago

Hi Joe, I briefly read over the protocol and I think it wouldn’t be an issue at all to modify the pipeline. Do you have a specific name you like to refer to this protocol by? Could you send me an example of the R1 and R2 sequences and outline the structures?

I’m not working on a different project now, but would be happy to create this branch in some spare time.

Best, Chris

JoeEmendo commented 4 years ago

Great, thank you!

Do you have a specific name you like to refer to this protocol by?

We can call it "iGUIDE-4-UMN-simplified-UMI"

Could you send me an example of the R1 and R2 sequences and outline the structures?

I've uploaded sample files for EMX1 which follow this protocol (links below), and I'll keep these links active for one week. These are biological replicates and the output more or less aligns with the original GUIDEseq output. As I'm sure you're aware, a major difference between the "UMN-simplified-UMI" protocol and iGUIDE is the ODN sequence and ability to do QA on the ODN. The UMI should be the first part of the R2 read

The NGS primers involved are shown on their materials page and below;

Oligonucleotides required: TA_Adaptor_Top - /5Phos/CTCACCGCTCTTGTAGS NNNNNNNN CTGTCTCTTATACACATCTCCGAGC TA_Adaptor_Bottom - CTACAAGAGCGGTGAGT dsODN_Enrich_Plus - TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG NNN GTTTAATTGAGTTGTCATATGTTAATAACGG dsODN_Enrich_Minus - TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG NNN CCGTTATTAACATATGACAACTCAATTAAAC Replace the italicized sequences with sequences targeting the dsODN that you used in your experiments. dsODN_Enrich_Adaptor - GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG Nextera Indexing Oligo design Nextera_R1 : AATGATACGGCGACCACCGAGATCTACAC [i5] TCGTCGGCAGCGTC Nextera_R2 : CAAGCAGAAGACGGCATACGAGAT [i7] GTCTCGTGGGCTCGG You will need to replace the [i5] & [i7] with unique index barcodes. You can use the Hamming barcodes from https://doi.org/10.1371/journal.pone.0036852. Order the appropriate number of indexing oligos for your samples based on if you want to do combinatorial indexing (cheaper) or unique dual indexing (better with the patterned flowcells). Note: When selecting index barcodes, make sure 1) there is no collision with other barcodes that will be run on the same flowcell as your samples 2) don't start with or have consecutive G nucleotides if the samples will be run on instruments that use two-color (NextSeq, NovaSeq) or one-color (iSeq) chemistry.

https://07d02809-9389-426f-88b8-1aaae2315381.s3.amazonaws.com/gEMX1_S1_L001_R1_001.fastq.gz https://07d02809-9389-426f-88b8-1aaae2315381.s3.amazonaws.com/gEMX1_S1_L001_R2_001.fastq.gz https://07d02809-9389-426f-88b8-1aaae2315381.s3.amazonaws.com/gEMX1_S2_L001_R1_001.fastq.gz https://07d02809-9389-426f-88b8-1aaae2315381.s3.amazonaws.com/gEMX1_S2_L001_R2_001.fastq.gz

iditbuch commented 4 years ago

Dear Chris, Following on Joe's request above, what's the status of the alternative UMI protocol?

cnobles commented 4 years ago

Thanks for both of your interests in the alternate method and using iGUIDE. I have been quite busy since starting my new position and haven't made the time for it yet. As for the status, I've conceived the changes that need to be made and just need to put in the time to reconfigure the rules, test with the given dataset, and role that out to both of you. Are there deadlines you are trying to make? Having an idea of a timeline would be helpful, and then I can update you on the feasibility.

iditbuch commented 4 years ago

Dear Chris,

Thanks for your reply. We would be happy to start using the enhanced iGUIDE by mid February. Is it feasible on your end?

Best regards

Idit Buch, Ph.D. Director of Computational Biology Emendo Biotherapeuticshttp://emendobio.com/ Office: +972-8-6838377 Cell: +972-54-5260169 Ilan Ramon 2, Bld D, Flr 2 Science park, Ness Ziona 741400 Israel

[signature_586543767]

This message is intended solely for the use of the addressee and may contain confidential and/or attorney-client privileged information. If you are not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any review, distribution, or copying of this message is strictly prohibited. If you received this message in error, please notify us immediately. Thank you.

From: Chris Nobles notifications@github.com Reply-To: cnobles/iGUIDE reply@reply.github.com Date: Friday, 10 January 2020 at 2:35 To: cnobles/iGUIDE iGUIDE@noreply.github.com Cc: Idit Buch iditb@emendobio.com, Comment comment@noreply.github.com Subject: Re: [cnobles/iGUIDE] Alternative UMI prep and analysis (#64)

Thanks for both of your interests in the alternate method and using iGUIDE. I have been quite busy since starting my new position and haven't made the time for it yet. As for the status, I've conceived the changes that need to be made and just need to put in the time to reconfigure the rules, test with the given dataset, and role that out to both of you. Are there deadlines you are trying to make? Having an idea of a timeline would be helpful, and then I can update you on the feasibility.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/cnobles/iGUIDE/issues/64?email_source=notifications&email_token=AMDRUTULH64B7EKYXTV5VS3Q467EBA5CNFSM4IY2GOIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEISIXYA#issuecomment-572820448, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMDRUTWXBRKNQKUXGKZUKUTQ467EBANCNFSM4IY2GOIA.

cnobles commented 4 years ago

I think I can make that happen. Will let you know when updates are pushed.

Best, Chris

iditbuch commented 4 years ago

Thanks!

From: Chris Nobles notifications@github.com Reply-To: cnobles/iGUIDE reply@reply.github.com Date: Monday, 20 January 2020 at 23:53 To: cnobles/iGUIDE iGUIDE@noreply.github.com Cc: Idit Buch iditb@emendobio.com, Comment comment@noreply.github.com Subject: Re: [cnobles/iGUIDE] Alternative UMI prep and analysis (#64)

I think I can make that happen. Will let you know when updates are pushed.

Best, Chris

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/cnobles/iGUIDE/issues/64?email_source=notifications&email_token=AMDRUTWWA5GAEWYLZSNHSHTQ6YMMFA5CNFSM4IY2GOIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJN4USY#issuecomment-576440907, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMDRUTRNHEUENC34NJ65Y4DQ6YMMFANCNFSM4IY2GOIA.

cnobles commented 4 years ago

Hi Idit and Joe, I've included the alternative UMI method in the branch update_1.1.0. I plan on merging this with the master branch after getting through a few more internal tests. The earliest commit that would work with this method is 1380f2c44ec7a4c72acb0512c97d6ee39e7fa0a7.

For the sample data, could you provide the gRNA target sequence? I assumed it was EMX from the GUIDEseq publication, but after analyzing your data, there were zero on-target sites. It was clear from the genomic distribution that there were other primary sites, including those on Chr2, suggesting I was just using the wrong target sequence.

This workflow is triggered in the iGUIDE software by including the argument "Alternate_UMI_Method : TRUE" in the config file. I've included an example config file as well, file path: iGUIDE/configs/umi_alt_example.config.yml (27def6290ebce7821c7bb2bd3bcd2337dddff0eb).

Please let me know when you have questions, I'll close this thread after I've merged with the master branch.

iditbuch commented 4 years ago

Dear Chris,

Thanks very much for creating the branch of alternative UMI method!

The gRNA target name is EMX1 and the sequence is GAGTCCGAGCAGAAGAAGAANGG Attached is the analysis of the GuideSeq that we received using the UMN pipeline. Note that we can see the OnTarget (1st line).

Best regards

Idit

From: Chris Nobles notifications@github.com Reply-To: cnobles/iGUIDE reply@reply.github.com Date: Sunday, 23 February 2020 at 19:13 To: cnobles/iGUIDE iGUIDE@noreply.github.com Cc: Idit Buch iditb@emendobio.com, Comment comment@noreply.github.com Subject: Re: [cnobles/iGUIDE] Alternative UMI prep and analysis (#64)

Hi Idit and Joe, I've included the alternative UMI method in the branch update_1.1.0. I plan on merging this with the master branch after getting through a few more internal tests. The earliest commit that would work with this method is 1380f2chttps://github.com/cnobles/iGUIDE/commit/1380f2c44ec7a4c72acb0512c97d6ee39e7fa0a7.

For the sample data, could you provide the gRNA target sequence? I assumed it was EMX from the GUIDEseq publication, but after analyzing your data, there were zero on-target sites. It was clear from the genomic distribution that there were other primary sites, including those on Chr2, suggesting I was just using the wrong target sequence.

This workflow is triggered in the iGUIDE software by including the argument "Alternate_UMI_Method : TRUE" in the config file. I've included an example config file as well, file path: iGUIDE/configs/umi_alt_example.config.yml (27def62https://github.com/cnobles/iGUIDE/commit/27def6290ebce7821c7bb2bd3bcd2337dddff0eb).

Please let me know when you have questions, I'll close this thread after I've merged with the master branch.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/cnobles/iGUIDE/issues/64?email_source=notifications&email_token=AMDRUTXHZXKFZ6JACPPHUUDREKVCPA5CNFSM4IY2GOIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWBPXI#issuecomment-590092253, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMDRUTQ5MQU7QLLMDGZ4OQDREKVCPANCNFSM4IY2GOIA.

cnobles commented 4 years ago

Thanks Idit,

I ran through the example and was also able to detect the on-target location without any issues. I have included an example config file to show how to setup the software to run in the alternative UMI configuration (latest commit 1bd8d4f0844a12c9ef5fcdca6233c1e420649890).

As a note, the iGUIDE software assumes only one set of trimming parameters for a single sample. So I would recommend barcoding your positive and negative amplified samples with different barcodes and treating them as two different samples. Given the same specimen identifier, they will be pooled in the final analysis anyways (Sample identifier: testA-1, Specimen identifier: testA). This will save you from either duplicating the data or running the software twice. Minor point though.

Let me know if you have any additional questions. As I've merged this into the master branch, I'll close this Issue for now. If something arises, we can reopen.

Best, Chris

iditbuch commented 4 years ago

Dear Chris, Thanks very much for enhancing the iGUIDE to also support the UMI configuration. Surely, it'll be of great help to us.

Best regards

Idit