aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
410 stars 181 forks source link

Some questions about the input format #221

Closed biozzq closed 3 years ago

biozzq commented 3 years ago

Dear all,

I usually convert the allValidPairs to the input file for juicer by using hicpro2juicebox.sh. However, when I learn how the shell script works (the following command), I found that I didn't remove the "chr" substring in chromosomes in my previous analysis. And more, do we need to rearrange the record for each read pair, such as the first read precedes the chromosome of the second read ($2 <= $5) when they are on different chromosomes; the first read precedes the position in base pairs of the second read when the read pair share the same chromosome? Thank you.

awk '{$4=$4!="+"; $7=$7!="+"} $2<=$5{print $1, $4, $2, $3, 0, $7, $5, $6, 1, $11, $12 }$5<$2{ print $1, $7, $5, $6, 0, $4, $2, $3, 1, $12, $11 }' $VALIDPAIRS | sort -T ${TEMP} -k3,3d -k7,7d -S 50% --parallel=8 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted

Sincerely, Zheng zhuqing

nchernia commented 3 years ago

You don't need to remove the "chr" if it was present in your fasta.

And yes you need to rearrange. This script looks fine.

On Wed, Mar 31, 2021 at 12:36 AM biozzq @.***> wrote:

Dear all,

I usually convert the allValidPairs to the input file for juicer by using hicpro2juicebox.sh. However, when I learn how the shell script works (the following command), I found that I didn't remove the "chr" substring in chromosomes in my previous analysis. And more, do we need to rearrange the record for each read pair, such as the first read precedes the chromosome of the second read ($2 <= $5) when they are on different chromosomes; the first read precedes the position in base pairs of the second read when the read pair share the same chromosome? Thank you.

awk '{$4=$4!="+"; $7=$7!="+"} $2<=$5{print $1, $4, $2, $3, 0, $7, $5, $6, 1, $11, $12 }$5<$2{ print $1, $7, $5, $6, 0, $4, $2, $3, 1, $12, $11 }' $VALIDPAIRS | sort -T ${TEMP} -k3,3d -k7,7d -S 50% --parallel=8 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted

Sincerely, Zheng zhuqing

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/221, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EWZ6YBIWVA6YJ2J3WNLTGKRGTANCNFSM42DVBLOQ .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org

biozzq commented 3 years ago

Dear @nchernia

Thank you.

Best,