Open GryderArt opened 6 years ago
moved to Charles Lin site where they see the same issue https://github.com/linlabbcm/rose2/issues/6
Berkley,
Do you see any output at all in this folder?
(edited out paths per Berkley's request)
Charles Y. Lin Assistant Professor Department of Molecular and Human Genetics Dan L. Duncan Cancer Center Baylor College of Medicine
On Sun, Apr 1, 2018 at 1:05 PM, Berkley Gryder notifications@github.com wrote:
An unusual error is clogging ROSE2 during bam mapping. The log file reads:
Has anyone seen this error before, and solved it? I'm running ROSE2 on a cluster node with --mem=121g --cpus-per-task=4
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BradnerLab/pipeline/issues/62, or mute the thread https://github.com/notifications/unsubscribe-auth/AEXAr0bDCbBdhdPXyOdyzjSoEYJPdmdYks5tkRb_gaJpZM4TC6pJ .
No - normally I see files in that folder, but currently they are being left blank. in fact, there are no folders at all underneath the mappedGFF folder.
So that's the step that bamliquidator is running. Do you see anything at all in standard output showing either an error for bamliquidator or a "liquidating..." style output?
My suspicion is that ROSE2 can't find/call bamliquidator correctly on the node.
-Charles
Charles Y. Lin Assistant Professor Department of Molecular and Human Genetics Dan L. Duncan Cancer Center Baylor College of Medicine
On Sun, Apr 1, 2018 at 5:20 PM, Berkley Gryder notifications@github.com wrote:
No - normally I see files in that folder, but currently they are being left blank. in fact, there are no folders at all underneath the mappedGFF folder.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BradnerLab/pipeline/issues/62#issuecomment-377821248, or mute the thread https://github.com/notifications/unsubscribe-auth/AEXAr90EPigjH4R-uFEHCoAMQBX2ITMTks5tkVLEgaJpZM4TC6pJ .
there is no liquidating style output:
but I am running this on a cluster node as a batch job, and thus it is possible that I'm not able to see everything I would if I ran it on an interactive node. Should I try that to see what other errors are thrown?
ok so here are some more lines of error following the first "Operation Timed Out" error: MAPPING TO THE FOLLOWING BAMS: (edited out paths per Berkley's request)
OPERATION TIMED OUT. FILE NOT FOUND bamliquidator_batch --sense . -e 200 --match_bamToGFF -r
OPERATION TIMED OUT. FILE /data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/MACS_Out_p_1e-14/ROSE_out_12500/mappedGFF/Sample_H3K27ac_024_C_HLFMLBGX3_peaks_0KB_STITCHED_TSS_DISTAL_Sample_H3K27ac_024_C_HLFMLBGX3.bam_MAPPED/matrix.txt NOT FOUND
ERROR: FAILED TO MAP
Yeah. It definitely cannot find bamliquidator. Not sure why this is breaking all of a sudden when it was working previously.
Can you run bamliquidator on its own and see if it works?
-Charles
On Apr 2, 2018, at 11:48 AM, Berkley Gryder notifications@github.com wrote:
ok so here are some more lines of error following the first "Operation Timed Out" error: MAPPING TO THE FOLLOWING BAMS: ['/data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/Sample_H3K27ac_024_C_HLFMLBGX3.bam', '/data/khanlab/projects/ChIP_seq/DATA/Sample_D4_input_026_C_H7FGKBGX5/Sample_D4_input_026_C_H7FGKBGX5.bam'] OPERATION TIMED OUT. FILE /data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/MACS_Out_p_1e-14/ROSE_out_12500/mappedGFF/Sample_H3K27ac_024_C_HLFMLBGX3_peaks_0KB_STITCHED_TSS_DISTAL_Sample_H3K27ac_024_C_HLFMLBGX3.bam_MAPPED/matrix.txt NOT FOUND bamliquidator_batch --sense . -e 200 --match_bamToGFF -r /data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/MACS_Out_p_1e-14/ROSE_out_12500/gff/Sample_H3K27ac_024_C_HLFMLBGX3_peaks_0KB_STITCHED_TSS_DISTAL.gff -o /data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/MACS_Out_p_1e-14/ROSE_out_12500/mappedGFF/Sample_H3K27ac_024_C_HLFMLBGX3_peaks_0KB_STITCHED_TSS_DISTAL_Sample_H3K27ac_024_C_HLFMLBGX3.bam_MAPPED /data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/Sample_H3K27ac_024_C_HLFMLBGX3.bam OPERATION TIMED OUT. FILE /data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/MACS_Out_p_1e-14/ROSE_out_12500/mappedGFF/Sample_H3K27ac_024_C_HLFMLBGX3_peaks_0KB_STITCHED_TSS_DISTAL_Sample_H3K27ac_024_C_HLFMLBGX3.bam_MAPPED/matrix.txt NOT FOUND ERROR: FAILED TO MAP /data/khanlab/projects/ChIP_seq/DATA/Sample_H3K27ac_024_C_HLFMLBGX3/MACS_Out_p_1e-14/ROSE_out_12500/gff/Sample_H3K27ac_024_C_HLFMLBGX3_peaks_0KB_STITCHED_TSS_DISTAL.gff FROM BAM: Sample_H3K27ac_024_C_HLFMLBGX3.bam
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
We found the 5 minute max time hardcoded into ROSE here: if utils.checkOutput(mappedOut1File,0.2,5): print("SUCCESSFULLY MAPPED TO %s FROM BAM: %s" % (stitchedGFFFile, bamFileName)) else: print("ERROR: FAILED TO MAP %s FROM BAM: %s" % (stitchedGFFFile, bamFileName))
And tried to increase the max run time from 5 minutes to 30 minutes – turns out that didn’t fix it, and since bamliquidator can run at more than 11 million reads per second, and our BAM files don’t have more than 40 million reads each, it is not a time out of running bamliquidator.
I'll try to run bamliquidator on its own and continue from there. Got a few people working on it now with professional code skills, so we will figure it out soon I hope, and will report back here when we solve it.
also, bamliqidator is running just fine from the command line on this sample. Just isn't working from ROSE2.
Yeah, seems to me a bamliquidator pathing issue on the node or in ROSE2.
ROSE2 calls a wrapper called bamliquidator_batch.py can you try running that command by itself on your cluster vs. local and see what happens?
-Charles
Charles Y. Lin Assistant Professor Department of Molecular and Human Genetics Dan L. Duncan Cancer Center Baylor College of Medicine
On Tue, Apr 3, 2018 at 4:49 PM, Berkley Gryder notifications@github.com wrote:
also, bamliqidator is running just fine from the command line on this sample. Just isn't working from ROSE2.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BradnerLab/pipeline/issues/62#issuecomment-378410516, or mute the thread https://github.com/notifications/unsubscribe-auth/AEXAryWXab3x0qAAElMgKcW82WwmQS4jks5tk-5cgaJpZM4TC6pJ .
so we got a “segmentation fault” when running bamliquidator_batch.py on an interactive node; what does that mean?
nothing good. it's possible bamliquidator was compiled on one node and is not compatible on the other, but this is now starting to get outside of my realm of expertise. am cc'ing john dimatteo who developed bamliquidator.
John is there some sort of output that would be most informative to debug a seg fault or should they just try to re-compile on the node?
Unfortunately this is getting into territory where it's highly dependent on OS and other base level configuration.
Charles Y. Lin Assistant Professor Department of Molecular and Human Genetics Dan L. Duncan Cancer Center Baylor College of Medicine
On Wed, Apr 4, 2018 at 2:00 PM, Berkley Gryder notifications@github.com wrote:
so we got a “segmentation fault” when running bamliquidator_batch.py on an interactive node; what does that mean?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BradnerLab/pipeline/issues/62#issuecomment-378709380, or mute the thread https://github.com/notifications/unsubscribe-auth/AEXAr-SM1tjd7w3TJrwHPiLCmABopY33ks5tlRhggaJpZM4TC6pJ .
So, while I don't understand it entirely, we got it working by correcting the version control of bamliquidator. during the failures, it was calling up in this manner: [+] Loading hdf5 1.8.15 [+] Loading bamliquidator version 1.3...
now, hdf5 doesn't even appear in the loading modules list. we are using something called "singularity", which loads a uniform version: [+] Loading bamliquidator 1.3.4 on cn1654 [+] Loading singularity 2.4.5 on cn1654
The explanation I got from the folks managing our clusters:
Bamliquidator v.1.3.4 (the default now) is in a singularity container. By loading the module it gives you bamliquidator and bamliquidator_batch as executables/wrappers, which calls bamliquidator inside the container. Singularity is a type of container (just like docker). In the case of bamliquidator, it allows bamliquator to be installed using ubuntu OS (using apt-get) and used/run in the HPC centos OS. More details on: https://hpc.nih.gov/apps/singularity.html
An unusual error is clogging ROSE2 during bam mapping. The log file reads:
MAPPING TO THE FOLLOWING BAMS: (edited out paths per Berkley's request)
OPERATION TIMED OUT. FILE matrix.txt NOT FOUND
Has anyone seen this error before, and solved it? I'm running ROSE2 on a cluster node with --mem=121g --cpus-per-task=4