PASApipeline / PASApipeline

PASA software
Other
174 stars 57 forks source link

Error in blat_top_hit_extractor.pl #190

Closed soungalo closed 3 years ago

soungalo commented 3 years ago

I am running the PASA assembly pipeline on a small input (~150kb), like this:

Launch_PASA_pipeline.pl -c alignAssembly.config -C -R -g Pt.fa -t transcripts.fasta --ALIGNERS blat
--TRANSDECODER --CPU 30

The run dies after a few minutes, with lots of error messages (see attached stderr), the first one being:

Error, 1 exceeded  at /groups/itay_mayrose_nosnap/liorglic/Panoramic/EVM_annotation/.snakemake/conda/f4d5ee21/opt/pasa-2.4.1/scripts/blat_top_hit_extractor.pl line 151, <FILE> line 1.
CMD: sort -k1,1 -k2,2nr blat_out_dir/partition.87824.fa.pslx.scores > blat_out_dir/partition.87824.fa.pslx.scores.sort_by_score
Thread 40 terminated abnormally: Error, cmd:
/groups/itay_mayrose_nosnap/liorglic/Panoramic/EVM_annotation/.snakemake/conda/f4d5ee21/opt/pasa-2.4.1/scripts/blat_top_hit_extractor.pl blat_out_dir/partition.63872.fa.pslx 1 > blat_out_dir/partition.63872.fa.pslx.top_1
 died with ret (6400) at /groups/itay_mayrose_nosnap/liorglic/Panoramic/EVM_annotation/.snakemake/conda/f4d5ee21/opt/pasa-2.4.1/PerlLib/Process_cmd.pm line 18 thread 40.

I think the problem is that the file blat_out_dir/partition.87824.fa.pslx.scores does not exist, but why?

Pt_PASA_assembly.err.txt

brianjohnhaas commented 3 years ago

Hi Lior,

This is the first time this particular error message has showed up. It's pretty peculiar.

If you rerun the original job and have it retry/resume, does it continue to throw this error?

In order to troubleshoot it, I'd need to look at one of the files such as: blat_out_dir/partition.231536.fa.pslx

in case you're able to gzip it and make it available to me. @.***

best,

~b

On Mon, Jun 14, 2021 at 2:31 AM Lior Glick @.***> wrote:

I am running the PASA assembly pipeline on a small input (~150kb), like this:

Launch_PASA_pipeline.pl -c alignAssembly.config -C -R -g Pt.fa -t transcripts.fasta --ALIGNERS blat --TRANSDECODER --CPU 30

The run dies after a few minutes, with lots of error messages (see attached stderr), the first one being:

Error, 1 exceeded at /groups/itay_mayrose_nosnap/liorglic/Panoramic/EVM_annotation/.snakemake/conda/f4d5ee21/opt/pasa-2.4.1/scripts/ blat_top_hit_extractor.pl line 151, line 1. CMD: sort -k1,1 -k2,2nr blat_out_dir/partition.87824.fa.pslx.scores > blat_out_dir/partition.87824.fa.pslx.scores.sort_by_score Thread 40 terminated abnormally: Error, cmd:

/groups/itay_mayrose_nosnap/liorglic/Panoramic/EVM_annotation/.snakemake/conda/f4d5ee21/opt/pasa-2.4.1/scripts/ blat_top_hit_extractor.pl blat_out_dir/partition.63872.fa.pslx 1 > blat_out_dir/partition.63872.fa.pslx.top_1 died with ret (6400) at /groups/itay_mayrose_nosnap/liorglic/Panoramic/EVM_annotation/.snakemake/conda/f4d5ee21/opt/pasa-2.4.1/PerlLib/Process_cmd.pm line 18 thread 40.

I think the problem is that the file blat_out_dir/partition.87824.fa.pslx.scores does not exist, but why?

Pt_PASA_assembly.err.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas

soungalo commented 3 years ago

This file seems to be an empty blat result:

psLayout version 3

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q               Q       Q       Q       T         T        T       T       block   blockSizes      qStarts  tStarts
        match   match           count   bases   count   bases           name            size    start   end     name      size     start   end     count
---------------------------------------------------------------------------------------------------------------------------------------------------------------
brianjohnhaas commented 3 years ago

hmm... Try rerunning with CPU set to a smaller number. Whatever you set CPU to is the number of separate BLAT searches that will be running simultaneously, and each one could end up using a lot of RAM. The system should be capturing failures at the blat execution step but perhaps that didn't happen.

If you have our recommended version of gmap installed, you could try using gmap instead. It should use less ram than the blat approach the way the blat system was configured.

hope this helps

On Tue, Jun 15, 2021 at 10:33 AM Lior Glick @.***> wrote:

This file seems to be an empty blat result:

psLayout version 3

match mis- rep. N's Q gap Q gap T gap T gap strand Q Q Q Q T T T T block blockSizes qStarts tStarts match match count bases count bases name size start end name size start end count

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PASApipeline/PASApipeline/issues/190#issuecomment-861551874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXYQSKZS3WPLHWJ5IGTTS5QDXANCNFSM46USO57Q .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

soungalo commented 3 years ago

Thanks, I'll give it a try. But just to make things clear: is an empty blat file an invalid result? Isn't it expected that some partitions will end up with no hits, especially for small genomes and many CPUs?

brianjohnhaas commented 3 years ago

an empty result is not necessarily an error, but I'd find it to be highly peculiar nonetheless.

On Tue, Jun 15, 2021 at 10:52 AM Lior Glick @.***> wrote:

Thanks, I'll give it a try. But just to make things clear: is an empty blat file an invalid result? Isn't it expected that some partitions will end up with no hits, especially for small genomes and many CPUs?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PASApipeline/PASApipeline/issues/190#issuecomment-861567734, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX5M2SEWBJY7L6BEV6TTS5SMFANCNFSM46USO57Q .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas