cancerit / telomerecat

Telomerecat: The telomere computational analysis tool
GNU General Public License v3.0
20 stars 5 forks source link

user reported slow process #13

Closed byb121 closed 3 years ago

byb121 commented 4 years ago

From one user:

I have run one my bam files with 50GB size, however, it runs over 100 hours without producing output. I assigned 4cpus in my job script, and also ran another script with 8cpus assigned together with argument "-p8" in the telomereact command (see below), but that is running over time too. I wonder if you could please advise me how to resolve this. Thanks very much.

!/bin/bash

PBS -l select=1:ncpus=8:mem=32G,walltime=100:00:00

telomerecat bam2length -p8 sample.sort.bam --output sample.csv

eb32142 commented 4 years ago

From one user:

I have run one my bam files with 50GB size, however, it runs over 100 hours without producing output. I assigned 4cpus in my job script, and also ran another script with 8cpus assigned together with argument "-p8" in the telomereact command (see below), but that is running over time too. I wonder if you could please advise me how to resolve this. Thanks very much.

!/bin/bash

PBS -l select=1:ncpus=8:mem=32G,walltime=100:00:00

telomerecat bam2length -p8 sample.sort.bam --output sample.csv

After 100 hours of running I see the following massage in the error file:

OSError: truncated file =>> PBS: job killed: walltime 360028 exceeded limit 360000

byb121 commented 4 years ago

@eb32142, Could you subset your BAM file with a command like this:

samtools view -b -s 2.1 sample.sort.bam > ten_percent_of_sample.sort.bam

And run this command without using your PBS system:

telomerecat bam2length -p8 ten_percent_of_sample.sort.bam --output sample.csv -v 2

-v will output more detailed messages in your terminal. If it fails, could you post your terminal output here please?

Thanks,

eb32142 commented 4 years ago

The job

@eb32142, Could you subset your BAM file with a command like this:

samtools view -b -s 2.1 sample.sort.bam > ten_percent_of_sample.sort.bam

And run this command without using your PBS system:

telomerecat bam2length -p8 ten_percent_of_sample.sort.bam --output sample.csv -v 2

-v will output more detailed messages in your terminal. If it fails, could you post your terminal output here please?

Thanks,

eb32142 commented 4 years ago

@byb121,

I am running this and my job is still in running after 22 hours. The subset bam file is 6.4 GB.

byb121 commented 4 years ago

That does not sound right. Would you be able to kill the job and post the log here, or send it to my email if it's too long for this thread. Thanks.

eb32142 commented 4 years ago

Thanks @byb121,

I think the issue might be the cpu usage for the job. I am running this on a HPC. Below is my job running status showing low cpu being used:

Job_Name = test2.sh resources_used.cpupercent = 2 resources_used.cput = 01:02:22 resources_used.mem = 33554432kb resources_used.ncpus = 8 resources_used.vmem = 33554432kb resources_used.walltime = 35:45:19 job_state = R

byb121 commented 4 years ago

Could you use the script below to run your job?

#!/bin/bash
#PBS -l select=1:ncpus=8:mem=32G,walltime=100:00:00
#PBS -o job_parallel.log // output file
#PBS -e job_parallel.log // error output file
telomerecat bam2length -p8 ten_percent_of_sample.sort.bam --output sample.csv -v 2

Let it run for an hour and kill the job. Please then send me job_parallel.log via email or post it here if it's not hugely long.

I suspect telomerecat failed to manage all its threads in your computing environment. To test it, could you also run a job use this script?

#!/bin/bash
#PBS -l mem=16G,walltime=100:00:00
#PBS -o job.log // output file
#PBS -e job.log // error output file
telomerecat bam2length ten_percent_of_sample.sort.bam --output sample.csv -v 2

Again let it run for an hour and kill the job. Please then send me job.log via email or post it here.

Thanks,

eb32142 commented 4 years ago

Hi Yaobo (@byb121 ),

Thanks for your response. I ran the first command and below is the log file. I get lots of "Could not retrive index..." followed by the last lines including the errors as follow:

[E::idx_find_and_load] Could not retrieve index file for '/tmp/telomerecat_bam2length-cb7jynhu/ten_percent_of_sample.sort_telbam.bam'

[E::bgzf_read] [E::idx_find_and_load] Read block operation failed with error 2 after 0 of 4 bytesCould not retrieve index file for'/tmp/telomerecat_bam2length-cb7jynhu/ten_percent_of_sample.sort_telbam.bam' [E::idx_find_and_load] [E::idx_find_and_load] Could not retrieve index file for '/tmp/telomerecat_bam2length-cb7jynhu/ten_percent_of_sample.sort_telbam.bam'Could not retrieve index file for '/tmp/telomerecat_bam2length-cb7jynhu/ten_percent_of_sample.sort_telbam.bam'

[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes Process Task-10:2: Traceback (most recent call last): File "/home/em924/anaconda3/envs/telomerecat/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "parabam/core.pyx", line 291, in parabam.core.Task.run File "parabam/core.pyx", line 311, in parabam.core.Task.__generate_results File "parabam/command/core.pyx", line 50, in parabam.command.core.Task.process_task_set File "pysam/libcalignmentfile.pyx", line 2187, in pysam.libcalignmentfile.IteratorRowAll.next__ OSError: truncated file

Thanks.

On Fri, Jun 26, 2020 at 8:39 PM Yaobo Xu notifications@github.com wrote:

Could you use the script below to run your job?

!/bin/bash

PBS -l select=1:ncpus=8:mem=32G,walltime=100:00:00

PBS -o job_parallel.log // output file

PBS -e job_parallel.log // error output file

telomerecat bam2length -p8 ten_percent_of_sample.sort.bam --output sample.csv -v 2

Let it run for an hour and kill the job. Please then send me job_parallel.log via email or post it here if it's not hugely long.

I suspect telomerecat failed to manage all its threads in your computing environment. To test it, could you also run a job use this script?

!/bin/bash

PBS -l mem=16G,walltime=100:00:00

PBS -o job.log // output file

PBS -e job.log // error output file

telomerecat bam2length ten_percent_of_sample.sort.bam --output sample.csv -v 2

Again let it run for an hour and kill the job. Please then send me job.log via email or post it here.

Thanks,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cancerit/telomerecat/issues/13#issuecomment-650112130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHQVIZL7IS7SQVAGZKYH6TRYR3GFANCNFSM4OGQS6BA .

--

Ebrahim MahmoudiPhD Candidate Medical Genetics University of Newcastle Australia

eb32142 commented 4 years ago

Hi Yaobo,@byb121

Could it be an issue with samtools?

On Fri, Jun 26, 2020 at 8:39 PM Yaobo Xu notifications@github.com wrote:

Could you use the script below to run your job?

!/bin/bash

PBS -l select=1:ncpus=8:mem=32G,walltime=100:00:00

PBS -o job_parallel.log // output file

PBS -e job_parallel.log // error output file

telomerecat bam2length -p8 ten_percent_of_sample.sort.bam --output sample.csv -v 2

Let it run for an hour and kill the job. Please then send me job_parallel.log via email or post it here if it's not hugely long.

I suspect telomerecat failed to manage all its threads in your computing environment. To test it, could you also run a job use this script?

!/bin/bash

PBS -l mem=16G,walltime=100:00:00

PBS -o job.log // output file

PBS -e job.log // error output file

telomerecat bam2length ten_percent_of_sample.sort.bam --output sample.csv -v 2

Again let it run for an hour and kill the job. Please then send me job.log via email or post it here.

Thanks,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cancerit/telomerecat/issues/13#issuecomment-650112130, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHQVIZL7IS7SQVAGZKYH6TRYR3GFANCNFSM4OGQS6BA .

--

Ebrahim MahmoudiPhD Candidate Medical Genetics University of Newcastle Australia

byb121 commented 4 years ago

It's complaining it can't find the index file. Could you run this first before you submit your PBS scripts:

samtools index ten_percent_of_sample.sort.bam

Then again, let it run for about an hour and please post any error here. Thanks.

eb32142 commented 4 years ago

Thanks @byb121 ,

I made the index and then ran the telomercat, but again got the exact same error as before:

[E::idx_find_and_load] Could not retrieve index file for '/tmp/telomerecat_bam2length-3yrilcal/ten_percent_of_sample.sort_telbam.bam' [E::idx_find_and_load] Could not retrieve index file for '/tmp/telomerecat_bam2length-

[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes Process Task-12:2: Traceback (most recent call last): File "/home/em924/anaconda3/envs/telomerecat/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "parabam/core.pyx", line 291, in parabam.core.Task.run File "parabam/core.pyx", line 311, in parabam.core.Task.__generate_results File "parabam/command/core.pyx", line 50, in parabam.command.core.Task.process_task_set File "pysam/libcalignmentfile.pyx", line 2187, in pysam.libcalignmentfile.IteratorRowAll.next__ OSError: truncated file

byb121 commented 4 years ago

Hi @eb32142 , if you revert pysam version to 0.15.3, it may just run without a problem. Please let us know if it solves the problem.

keiranmraine commented 3 years ago

Closing as no response