aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
410 stars 181 forks source link

Codes for restriction sites of mixed restriction enzymes #234

Closed bcbanderson closed 2 years ago

bcbanderson commented 3 years ago

Hey, I have a HiC data for which a cocktail of mixed restriction enzymes was used for cut at the following recognition sites: ^GATC, G^ANTC, C^TNAG, and T^TAA. How to code these patterns in the section "Set ligation junction based on restriction enzyme" in juicer pipeline? Thank you!

Set ligation junction based on restriction enzyme

case $site in HindIII) ligation="AAGCTAGCTT";; DpnII) ligation="GATCGATC";; MboI) ligation="GATCGATC";; NcoI) ligation="CCATGCATGG";; none) ligation="XXXX";; *) ligation="XXXX" echo "$site not listed as recognized enzyme. Using $site_file as site file" echo "Ligation junction is undefined" exit 100 esac

edfajardo commented 3 years ago

You can add a line similar to the existing line for the Arima cocktail, but you would have to add the extra possible junctions yourself. The relevant Arima line is

        Arima) ligation="'(GAATAATC|GAATACTC|GAATAGTC|GAATATTC|GAATGATC|GACTAATC|GACTACTC|GACTAGTC|GACTATTC|GACTGATC|GAGTAATC|GAGTACTC|GAGTAGTC|GAGTATTC|GAGTGATC|GATCAATC|GATCACTC|GATCAGTC|GATCATTC|GATCGATC|GATTAATC|GATTACTC|GATTAGTC|GATTATTC|GATTGATC)'" ;;

That is already 25 possible junctions, coming from ^GATC, G^ANTC (two of the patterns in your cocktail). The 25 junctions arise because you have 5 cutting sequences (substituting for N in the second general pattern), and all ligations are possible: 5x5=25. To these you would have to add the possible junctions of the other cutters, C^TNAG and T^TAA, which give you an additional 5 cutting patterns for a total of 10 cutting sequence patterns. This means, if I am correct, that you have 10x10=100 possible junctions. You have some work to do here, but the syntax is as indicated (each possible ligation separated by a | symbol):

You can name this 100-sequence ligation pattern "myCocktail" or whatever you want, and then address it as such when you run juicer.sh, with the -s switch

myCocktail) ligation="'(xxxx|xxxx|your 100 ligation junctions...)'"

notice the double quote followed (or preceded) by a single quote at the start (or end).

I recommend that you read the following thread: https://groups.google.com/g/3d-genomics/c/1kgiGvi7vg8

nchernia commented 3 years ago

This is correct; also note that one should appropriately account for the baseline rate, which will be quite high with so many possible sequences.

On Fri, Aug 20, 2021 at 4:42 PM edfajardo @.***> wrote:

You can add a line similar to the existing line for the Arima cocktail, but you would have to add the extra possible junctions yourself. The relevant Arima line is

    Arima) ligation="'(GAATAATC|GAATACTC|GAATAGTC|GAATATTC|GAATGATC|GACTAATC|GACTACTC|GACTAGTC|GACTATTC|GACTGATC|GAGTAATC|GAGTACTC|GAGTAGTC|GAGTATTC|GAGTGATC|GATCAATC|GATCACTC|GATCAGTC|GATCATTC|GATCGATC|GATTAATC|GATTACTC|GATTAGTC|GATTATTC|GATTGATC)'" ;;

That is already 25 possible junctions, coming from ^GATC, G^ANT (two of the patterns in your cocktail). The 25 junctions arise because you have 5 cutting sequences (substituting for N in the second general pattern), and all ligations are possible: 5x5=25. To these you would have to add the possible junctions of the other cutters, C^TNAG and T^TAA, which give you and additional 5 cutting patterns for a total of 10 cutting sequence. This means, if I am correct, that you have 10x10=100 possible junctions. You have some work to do here, but the syntax is as indicated (each possible ligation separated by a | symbol):

You can name this 100-sequence ligation pattern "myCocktail" or whatever you want, and then address it as such when you run juicer.sh, with the -s switch

myCocktail) ligation="'(xxxx|xxxx|your 100 ligation junctions...)'"

notice the double quote followed (or preceded) by a single quote at the start (or end)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/234#issuecomment-902946049, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW7XDBOIZVNMWGZOUHDT5242BANCNFSM5BNKW5RQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org