DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
184 stars 44 forks source link

blobtools Create doing nothing #109

Closed Rob-murphys closed 3 years ago

Rob-murphys commented 3 years ago

I am trying to run blobtools Create but nothing seems to happen. I ran it for 2 hours with 5 cores and 20G memory and the only std output was:

^M[%] : 0%| | 0.00/64.0 [00:00<?, ?it/s]^M[%] : 100%|█

Is this normal and should I just increase the run time?

DRL commented 3 years ago

Hi,

That doesn't look good. What's the command you are running?

cheers,

dom

Rob-murphys commented 3 years ago

Hi, sorry i forgot to include it initially:

blobtools create -i $assembly -b $bam -t $outdir/${prefix}.out -o "$prefix" --nodes path/to/data/nodes.dmp --names path/to/data/names.dmp

The -t output is may lines similar to this excerpt:

tig00000077_np12        1447705 712     tig00000077_np12        CP041306.1      80.881  931     155     21      224524  225447  4346952 4346038 0.0
tig00000077_np12        1447705 706     tig00000077_np12        CP041306.1      78.774  1093    203     20      399585  400670  1734523 1733453 0.0
tig00000077_np12        1447705 706     tig00000077_np12        CP041306.1      80.252  952     176     8       355638  356581  4193630 4192683 0.0
tig00000077_np12        1447705 704     tig00000077_np12        CP041306.1      79.960  998     167     26      342256  343234  4204813 4203830 0.0
tig00000077_np12        1447705 704     tig00000077_np12        CP041306.1      84.349  722     109     4       269269  269989  4287918 4287200 0.0
tig00000077_np12        1447705 699     tig00000077_np12        CP041306.1      84.692  699     107     0       51631   52329   4665016 4664318 0.0
tig00000077_np12        1447705 652     tig00000077_np12        CP041306.1      78.131  1070    204     29      693870  694924  3650841 3649787 6.92e-180
tig00000077_np12        1447705 649     tig00000077_np12        CP041306.1      78.418  1024    201     20      692832  693845  3649671 3648658 8.95e-179
tig00000077_np12        1447705 649     tig00000077_np12        CP041306.1      79.289  956     176     19      246119  247063  4320517 4319573 8.95e-179
tig00000077_np12        1447705 649     tig00000077_np12        CP041306.1      79.825  912     166     16      219900  220802  4350021 4349119 8.95e-179
tig00000077_np12        1447705 632     tig00000077_np12        CP041306.1      81.865  772     121     18      272988  273751  4283958 4283198 9.02e-174
tig00000077_np12        1447705 632     tig00000077_np12        CP041306.1      80.549  838     150     12      437714  438547  4564705 4563877 9.02e-174
tig00000077_np12        1447705 628     tig00000077_np12        CP041306.1      84.006  669     92      10      429197  429859  7251489 7250830 1.17e-172
tig00000077_np12        1447705 590     tig00000077_np12        CP041306.1      78.000  1000    179     25      769541  770533  3566241 3565276 5.50e-161
tig00000077_np12        1447705 590     tig00000077_np12        CP041306.1      80.177  792     153     4       499752  500541  8807906 8807117 5.50e-161
tig00000077_np12        1447705 575     tig00000077_np12        CP041306.1      79.314  875     140     35      251578  252433  4317036 4316184 1.54e-156
tig00000077_np12        1447705 562     tig00000077_np12        CP041306.1      80.236  764     137     14      752060  752816  1806796 1807552 1.20e-152
tig00000077_np12        1447705 542     tig00000077_np12        CP041306.1      82.989  629     77      24      319217  319831  4230378 4229766 1.56e-146
tig00000077_np12        1447705 536     tig00000077_np12        CP041306.1      84.934  531     78      2       86318   86847   4629870 4629341 7.27e-145
tig00000077_np12        1447705 529     tig00000077_np12        CP041306.1      79.084  808     140     20      207841  208635  4356639 4355848 1.22e-142
tig00000077_np12        1447705 527     tig00000077_np12        CP041306.1      83.972  574     67      20      331655  332203  4215656 4215083 4.38e-142
tig00000077_np12        1447705 521     tig00000077_np12        CP041306.1      77.765  877     175     19      324774  325640  4223828 4222962 2.04e-140
tig00000077_np12        1447705 508     tig00000077_np12        CP041306.1      80.923  650     118     6       208852  209498  4355634 4354988 1.59e-136
tig00000077_np12        1447705 490     tig00000077_np12        CP041306.1      82.909  550     88      6       307908  308454  4247242 4246696 5.74e-131
Rob-murphys commented 3 years ago

@DRL Hi Dom,

Any idea what is going on here? Do I just need to give it more time?

Cheers

DRL commented 3 years ago

I mean hard to say...

Can you post the actual command you ran and the whole output?

Something went wrong and that has to do with the input files.

You need to tell me more about the input files...

cheers.

dom

Rob-murphys commented 3 years ago

@DRL Hi Dom

Here is the input:

samtools index $bam

cd $outdir

blobtools create -i $assembly -b $bam -t $outdir/${prefix}.out -o "$prefix" --nodes /home/lamma/faststorage/scripts/qc_and_filtering/data/nodes.dmp --names /home/lamma/faststorage/scripts/qc_and_filtering/data/names.dmp

Here is the output in full: PSU4_ISF1A.zip

Here is the input file: PSU4_ISF1A_nextPolish.zip

The names.dmp and nodes.dmp are just from : wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz

Cheers Lamma

DRL commented 3 years ago

ok, so you have:

and your BAM file is sorted by readnames?

DRL commented 3 years ago

also, how did you map the reads? did you use bwa mem?

Rob-murphys commented 3 years ago

Yes the illumina short reads were just aligned to PSU4_ISF1A_nextPolish.fasta via bwa mem.

DRL commented 3 years ago

could you run the blobtools create command again and post the output it prints to screen?

So both STDERR and STDOUT

Rob-murphys commented 3 years ago

I am running inside a slurm job but the content of the orignal post is what I see in the .out of the slurm job so what should be what I see if I did it on the command line I think.

DRL commented 3 years ago

was there no .err file?

Rob-murphys commented 3 years ago

Nope, not from what I can see

Rob-murphys commented 3 years ago

Okay when running on the command line I just a much more realistic output:

[+] Parsing FASTA - /home/lamma/faststorage/kasun_amyco/nextPolish-ouput/PSU4_ISF1A/genome.nextpolish.fasta
[+] Creating nodesDB from /home/lamma/faststorage/scripts/qc_and_filtering/data/nodes.dmp and /home/lamma/faststorage/scripts/qc_and_filtering/data/names.dmp
[+] Parsing tax0 - /home/lamma/faststorage/kasun_amyco/blobtools/PSU4_ISF1A.out
[+] Computing taxonomy using taxrule(s) bestsum
[%] : 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64.0/64.0 [00:00<00:00, 7.52kit/s]
[+] Parsing bam0 - /home/lamma/faststorage/kasun_amyco/clean_short-reads_Q30/bwa-generic-output/PSU4_ISF1A.bam
[+] -> 100.00 (64/64) of sequences have reads aligned to them.
[+] -> 0.00 (0/245696) of reads are mapped.

This is what I see so far, seems that when inside a slurm job there is issues?

DRL commented 3 years ago

so it has parsed the FASTA and the BLAST file without incident and is now looking at the BAM file ...

how big is your BAM file? maybe the SLURM job finished because you hit the ceiling with RAM requirements to parse the BAM file...

Rob-murphys commented 3 years ago

the BAM file is 5739631. I am using 60G of RAM in the slurm job. on the command like the HPC really limite our RAM so it seems like it may be a slurm specific issue as the above output printed when running from the command line got much further that in the slurm job?

DRL commented 3 years ago

yeah ... could be SLURM thing... never worked with a SLURM cluster so better to ask your local sysadmin to help you out.

If there is a way to login on a node and run things directly, do that. Should work then. Reopen this if not.

Sorry that this was not straightforward ...

cheers,

dom

Rob-murphys commented 3 years ago

No worries, at least we seemed to have worked it out!