aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
410 stars 181 forks source link

bug in "chimeric_nonsites.awk" script in the PBS version of juicer. #208

Closed DustinSokolowski closed 3 years ago

DustinSokolowski commented 3 years ago

Hello!

Thank you for the fantastic tool. I am aligning some Hi-C data to a de novo genome our lab is creating using the PBS version of juicer and I think that there is a syntax error in the "chimeric_nonsites.awk" script.

I think that this is an error because when I run the script I get the following error:

awk -f /hpf/tools/centos7/juicer/1.6/scripts/chimeric_nonsites.awk NMR_Fibroblast_HiC_test.txt > NMR_Fibroblast_HiC_test.frag.txt

image

Alternatively, when I go onto the CPU version of juicer and run the original awk script, it has been running for over an hour and has been producing a reasonable output

awk '{printf("%s %s %s %d %s %s %s %d", $1, $2, $3, 0, $4, $5, $6, 1); for (i=7; i<=NF; i++) {printf(" %s",$i);}printf("\n");}' NMR_Fibroblast_HiC_test.txt > NMR_Fibroblast_HiC_test_frag.txt

image

Accordingly, I think that there is a bug in this new awk script.

I have attached the "NMR_Fibroblast_HiC_test.txt" file, which is the first 1000 lines of my *_norm.txt file to all you to try and re-produce the error. If you cannot re-produce it then perhaps the error is in how our directories are structured and I would love some help with that as well!

In terms of where I am running juicer: https://ccm.sickkids.ca/high-performance-computing/

Thank you so much, Dustin

NMR_Fibroblast_HiC_test.txt

nchernia commented 3 years ago

Hello,

We don't have a PBS system and so cannot test the tool. We didn't write the "nonsites" script, but I think you should just get rid of all the backslashes. If that works, please submit the updated file as a pull request.

Thanks! Neva

On Wed, Mar 3, 2021 at 2:16 PM Dustin Sokolowski notifications@github.com wrote:

Hello!

Thank you for the fantastic tool. I am aligning some Hi-C data to a de novo genome our lab is creating using the PBS version of juicer and I think that there is a syntax error in the "chimeric_nonsites.awk" script.

I think that this is an error because when I run the script I get the following error:

awk -f /hpf/tools/centos7/juicer/1.6/scripts/chimeric_nonsites.awk NMR_Fibroblast_HiC_test.txt > NMR_Fibroblast_HiC_test.frag.txt

[image: image] https://user-images.githubusercontent.com/56414662/109858715-2b307a00-7c2a-11eb-9ca2-f517a87cc547.png

Alternatively, when I go onto the CPU version of juicer and run the original awk script, it has been running for over an hour and has been producing a reasonable output

awk '{printf("%s %s %s %d %s %s %s %d", $1, $2, $3, 0, $4, $5, $6, 1); for (i=7; i<=NF; i++) {printf(" %s",$i);}printf("\n");}' NMR_Fibroblast_HiC_test.txt > NMR_Fibroblast_HiC_test_frag.txt

[image: image] https://user-images.githubusercontent.com/56414662/109858896-62069000-7c2a-11eb-8d05-6ca0cf12030f.png

Accordingly, I think that there is a bug in this new awk script.

I have attached the "NMR_Fibroblast_HiC_test.txt" file, which is the first 1000 lines of my *_norm.txt file to all you to try and re-produce the error. If you cannot re-produce it then perhaps the error is in how our directories are structured and I would love some help with that as well!

In terms of where I am running juicer: https://ccm.sickkids.ca/high-performance-computing/

Thank you so much, Dustin

NMR_Fibroblast_HiC_test.txt https://github.com/aidenlab/juicer/files/6078573/NMR_Fibroblast_HiC_test.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/juicer/issues/208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW2QWPZVLBBGOJVINTLTB2DHLANCNFSM4YR2DSBA .

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Assistant Professor | Molecular and Human Genetics Aiden Lab | Baylor College of Medicine www.aidenlab.org

DustinSokolowski commented 3 years ago

Hey Neva,

Removing the backslashes works great! Submitting the updated files as pull request now.

Best, Dustin