MicrobialDarkMatter / GraphMB

MIT License
35 stars 6 forks source link

bdb.BdbQuit Error #33

Open palomo11 opened 12 months ago

palomo11 commented 12 months ago

I am having the following issue

### writing best and last embs to graphmb_bins
Uncaught exception
Traceback (most recent call last):
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/bin/graphmb", line 8, in <module>
    sys.exit(main())
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/site-packages/graphmb/main.py", line 570, in main
    if args.writebins:
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/site-packages/graphmb/main.py", line 570, in main
    if args.writebins:
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/bdb.py", line 115, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit

This is the command I'm running:

graphmb --assembly ./Sample1_flye_assembly --outdir graphmb_bins --assembly_name Sample1_racon_medaka_polypolish_nextpolish_1000_OK.fasta --markers marker_gene_stats.tsv --graph_file assembly_graph.gfa  --depth Sample1_metabat_depth.txt --contignodes --writebins --numcores 40

This is the log info:

Running GraphMB 0.2.5
using cuda: False
setting seed to 1
Reading cache from
Reading assembly info file
==============Running VAE model=====================
******* Running model: CCVAE **********
***** using edge weights: True ******
***** cluster markers only: False *****
***** self edges only: False *****
***** Using raw kmer+abund features: True
***** SCG neg pairs: (428370, 2)
***** input features dimension: (34354, 111)
>>> Pre train stats: {'precision': 1.0, 'recall': 2.910869185538802e-05, 'f1': 5.8215689128220055e-05, 'ari': 0, 'hq': 15, 'mq': 23, 'n_clusters': 34354, 'unresolved_mags': 128, 'hq_comp': 98.44827586206898, 'hq_cont': 0.3448275862068966, 'unresolved_contigs': 34339, 'unresolved_contigs_with_scgs': 3513}
*** Model input dim 111, GNN input dim 64
use_ae: True, run AE only: False output clustering dim 64
**** initial edges batch size: 256 ****
**** epoch batch size doubles: [25, 75, 150, 300] ****
>>> Last epoch: 20 : {'precision': 1.0, 'recall': 2.910869185538802e-05, 'f1': 5.8215689128220055e-05, 'ari': 0, 'hq': 15, 'mq': 23, 'n_clusters': 34354, 'unresolved_mags': 128, 'hq_comp': 98.44827586206898, 'hq_cont': 0.3448275862068966, 'unresolved_contigs': 34339, 'unresolved_contigs_with_scgs': 3513, 'epoch': 499} <<<
>>> best epoch: 20 : {'precision': 1.0, 'recall': 2.910869185538802e-05, 'f1': 5.8215689128220055e-05, 'ari': 0, 'hq': 15, 'mq': 23, 'n_clusters': 34354, 'unresolved_mags': 128, 'hq_comp': 98.44827586206898, 'hq_cont': 0.3448275862068966, 'unresolved_contigs': 34339, 'unresolved_contigs_with_scgs': 3513} <<<
===================================================
writing features to /data/Sample1_flye_assembly/features.tsv
RUN 0
******* Running model: gcn **********
***** using edge weights: True ******
***** using disconnected: True ******
***** concat features: True *****
***** cluster markers only: False *****
***** threshold adj matrix: False *****
***** self edges only: False *****
***** Using raw kmer+abund features: False
***** SCG neg pairs: (428370, 2)
***** input features dimension: 64
***** Nodes used for clustering: 34354
>>> Pre train stats: {'precision': 1.0, 'recall': 2.910869185538802e-05, 'f1': 5.8215689128220055e-05, 'ari': 0, 'hq': 15, 'mq': 23, 'n_clusters': 34354, 'unresolved_mags': 128, 'hq_comp': 98.44827586206898, 'hq_cont': 0.3448275862068966, 'unresolved_contigs': 34339, 'unresolved_contigs_with_scgs': 3513}
*** Model input dim 64, GNN input dim 64
*** output clustering dim 32
>>> best epoch all contigs: 20 : {'precision': 1.0, 'recall': 2.910869185538802e-05, 'f1': 5.8215689128220055e-05, 'ari': 0, 'hq': 15, 'mq': 23, 'n_clusters': 34354, 'unresolved_mags': 128, 'hq_comp': 98.44827586206898, 'hq_cont': 0.3448275862068966, 'unresolved_contigs': 34339, 'unresolved_contigs_with_scgs': 3513, 'epoch': 499} <<<
>>> best epoch: 20 : {'precision': 1.0, 'recall': 2.910869185538802e-05, 'f1': 5.8215689128220055e-05, 'ari': 0, 'hq': 15, 'mq': 23, 'n_clusters': 34354, 'unresolved_mags': 128, 'hq_comp': 98.44827586206898, 'hq_cont': 0.3448275862068966, 'unresolved_contigs': 34339, 'unresolved_contigs_with_scgs': 3513, 'epoch': 59} <<<
### writing best and last embs to graphmb_bins
Uncaught exception
Traceback (most recent call last):
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/bin/graphmb", line 8, in <module>
    sys.exit(main())
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/site-packages/graphmb/main.py", line 570, in main
    if args.writebins:
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/site-packages/graphmb/main.py", line 570, in main
    if args.writebins:
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/bdb.py", line 115, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit
AndreLamurias commented 12 months ago

I have pushed a commit that fixes this issue (as well as #32). I will push it to pypi soon. Let me know if still have any errors.

palomo11 commented 11 months ago

Hi @AndreLamurias That error has been solved. However, all produced bins have only 1 contig, and most of the has bellow 50% completeness (evaluated by Checkm2). This is the command I run:

graphmb --assembly ./Sample1_flye_assembly --outdir graphmb_bins --assembly_name Sample1_racon_medaka_polypolish_nextpolish_1000_OK.fasta --markers marker_gene_stats.tsv --graph_file assembly_graph.gfa  --depth Sample1_metabat_depth.txt --contignodes --writebins --numcores 40

This is the log file:

Running GraphMB 0.2.5
using cuda: False
setting seed to 1
Cache not found on graphmb_bins
processing sequences /Sample1_flye_assembly/Sample1_racon_medaka_polypolish_nextpolish_1000_OK.fasta
read 30969 seqs
processing GFA file (contig nodes) /Sample1_flye_assembly/assembly_graph.gfa
read 9242, edges
reading depths
reading labels
Saved cache to graphmb_bins

Reading assembly info file
==============Running VAE model=====================
******* Running model: CCVAE **********
***** using edge weights: True ******
***** cluster markers only: False *****
***** self edges only: False *****
***** Using raw kmer+abund features: True
***** SCG neg pairs: (191862, 2)
***** input features dimension: (30969, 111)
>>> Pre train stats: {'precision': 1.0, 'recall': 3.2290354871000035e-05, 'f1': 6.457862447529868e-05, 'ari': 0, 'hq': 8, 'mq': 12, 'n_clusters': 30969, 'unresolved_mags': 84, 'hq_comp': 97.41379310344827, 'hq_cont': 0.10775862068965517, 'unresolved_contigs': 30961, 'unresolved_contigs_with_scgs': 2479}
*** Model input dim 111, GNN input dim 64
use_ae: True, run AE only: False output clustering dim 64
**** initial edges batch size: 256 ****
**** epoch batch size doubles: [25, 75, 150, 300] ****
>>> Last epoch: 20 : {'precision': 1.0, 'recall': 3.2290354871000035e-05, 'f1': 6.457862447529868e-05, 'ari': 0, 'hq': 8, 'mq': 12, 'n_clusters': 30969, 'unresolved_mags': 84, 'hq_comp': 97.41379310344827, 'hq_cont': 0.10775862068965517, 'unresolved_contigs': 30961, 'unresolved_contigs_with_scgs': 2479, 'epoch': 499} <<<
>>> best epoch: 20 : {'precision': 1.0, 'recall': 3.2290354871000035e-05, 'f1': 6.457862447529868e-05, 'ari': 0, 'hq': 8, 'mq': 12, 'n_clusters': 30969, 'unresolved_mags': 84, 'hq_comp': 97.41379310344827, 'hq_cont': 0.10775862068965517, 'unresolved_contigs': 30961, 'unresolved_contigs_with_scgs': 2479} <<<
===================================================
writing features to /Sample1_flye_assembly/features.tsv
RUN 0
******* Running model: gcn **********
***** using edge weights: True ******
***** using disconnected: True ******
***** concat features: True *****
***** cluster markers only: False *****
***** threshold adj matrix: False *****
***** self edges only: False *****
***** Using raw kmer+abund features: False
***** SCG neg pairs: (191862, 2)
***** input features dimension: 64
***** Nodes used for clustering: 30969
>>> Pre train stats: {'precision': 1.0, 'recall': 3.2290354871000035e-05, 'f1': 6.457862447529868e-05, 'ari': 0, 'hq': 8, 'mq': 12, 'n_clusters': 30969, 'unresolved_mags': 84, 'hq_comp': 97.41379310344827, 'hq_cont': 0.10775862068965517, 'unresolved_contigs': 30961, 'unresolved_contigs_with_scgs': 2479}
*** Model input dim 64, GNN input dim 64
*** output clustering dim 32
>>> best epoch all contigs: 20 : {'precision': 1.0, 'recall': 3.2290354871000035e-05, 'f1': 6.457862447529868e-05, 'ari': 0, 'hq': 8, 'mq': 12, 'n_clusters': 30969, 'unresolved_mags': 84, 'hq_comp': 97.41379310344827, 'hq_cont': 0.10775862068965517, 'unresolved_contigs': 30961, 'unresolved_contigs_with_scgs': 2479, 'epoch': 499} <<<
>>> best epoch: 20 : {'precision': 1.0, 'recall': 3.2290354871000035e-05, 'f1': 6.457862447529868e-05, 'ari': 0, 'hq': 8, 'mq': 12, 'n_clusters': 30969, 'unresolved_mags': 84, 'hq_comp': 97.41379310344827, 'hq_cont': 0.10775862068965517, 'unresolved_contigs': 30961, 'unresolved_contigs_with_scgs': 2479, 'epoch': 59} <<<
### writing best and last embs to graphmb_bins
30969 clusters
### skipped 30752 clusters while writing to file
### wrote 217 clusters 217 >= #contig 1
### precision: 1.000 0.000
### recall: 0.000 0.000
### f1: 0.000 0.000
### ari: 0.000 0.000
### hq: 8.000 0.000
### mq: 12.000 0.000
### n_clusters: 30969.000 0.000
### unresolved_mags: 84.000 0.000
### hq_comp: 97.414 0.000
### hq_cont: 0.108 0.000
### unresolved_contigs: 30961.000 0.000
### unresolved_contigs_with_scgs: 2479.000 0.000
### epoch: 59.000 0.000
8.0 0.0 12.0 0.0

I have seen this issue in the error log file

/software/python/anaconda3/2021.11/envs/graphmb_v025/lib/python3.10/site-packages/graphmb/contigsdataset.py:37: RuntimeWarning: invalid value encountered in divide
  counts = counts / counts.sum()
AndreLamurias commented 11 months ago

Hi @palomo11 From what I can see in your log file, everything seems ok except that the results don't improve after 20 epochs which is fairly low. Can you input the full log file or, even better, the input files you're using?

ZarulHanifah commented 3 months ago

I still got the same problem @AndreLamurias . Here is how I run the script:

graphmb --assembly results/graphmb_tmp \
    --outdir results/graphmb_tmp2 \
    --assembly_type flye \
    --numcores 12 --writebins \
    --loglevel debug --reload

I tried with --writebins and without it, but the log file looks the same:

logging to results/graphmb_tmp2/20240709-125645graphmb_output.log
Running GraphMB 0.2.5
Namespace(assembly='results/graphmb_tmp', assembly_name='assembly.fasta', graph_file='assembly_graph.gfa', edge_threshold=None, depth='assembly_depth.txt', features='features.tsv', labels=None, embs=None, model_name='gcn', activation='relu', layers_vae=2, layers_gnn=3, hidden_gnn=128, hidden_vae=512, embsize_gnn=32, embsize_vae=64, batchsize=256, batchtype='auto', dropout_gnn=0.1, dropout_vae=0.2, lr_gnn=0.01, lr_vae=0.001, graph_alpha=1, kld_alpha=200, ae_alpha=1, scg_alpha=1, clusteringalgo='vamb', kclusters=None, aggtype='lstm', decoder_input='vae', vaepretrain=500, ae_only=False, negatives=10, quick=False, classify=False, fanout='10,25', epoch=500, print=10, evalepochs=20, evalskip=50, eval_split=0.0, kmer=4, rawfeatures=False, clusteringloss=False, targetmetric='hq', concatfeatures=False, no_loss_weights=True, no_sample_weights=True, early_stopping=0.1, nruns=1, mincontig=1000, minbin=200000, mincomp=1, randomize=False, labelgraph=False, binarize=False, noedges=False, read_embs=False, reload=True, markers='marker_gene_stats.tsv', post='writeembs_contig2bin', writebins=True, skip_preclustering=False, outname='graphmb', cuda=False, noise=False, savemodel=False, tsne=False, numcores=12, outdir='results/graphmb_tmp2', assembly_type='flye', contignodes=False, seed=1, quiet=False, read_cache=False, version=False, loglevel='debug')
read 224094 seqs
processing GFA file (edge nodes) results/graphmb_tmp/assembly_graph.gfa
skipped contigs 15472 < 1000
read 18500, edges
reading depths
reading labels
Saved cache to results/graphmb_tmp2

Not using SCG file: marker_gene_stats.tsv (not found)
==============================
DATASET STATS:
number of sequences: 224094
assembly length: 1.945 Gb
assembly N50: 0.013 Mb
assembly average length (Mb): 0.009 max: 1.603 min: 0.0
coverage samples: 1
Graph file found and read
graph edges: 18500
contig paths: 230095
No SCG markers
==============================
==============Running VAE model=====================
setting tf seed
edges with overlapping scgs (max=20): []
deleted 0 edges with same SCGs
**** Num of edges: 239454
******* Running model: CCVAE **********
***** using edge weights: True ******
***** cluster markers only: False *****
***** self edges only: False *****
***** Using raw kmer+abund features: True
***** SCG neg pairs: (0,)
***** input features dimension: (224094, 104)
>>> Pre train stats: {'precision': 1.0, 'recall': 0.2650182512695565, 'f1': 0.4189951425658682, 'ari': 0, 'hq': 0, 'mq': 0, 'n_clusters': 77380, 'unresolved_mags': 0}
*** Model input dim 104, GNN input dim 64
use_ae: True, run AE only: False output clustering dim 64
  0%|          | 0/500 [00:00<?, ?it/s]**** initial nodes batch size: 256 ****
**** epoch batch size doubles: [25, 75, 150, 300] ****
[ccvae ] Total=0.098 kld=0.002 vae=0.098 kmer=0.036 ab=0.060 Bestnoeval=0 BestEpoch=0 Curnoeval=0 GPU=0.0MB: 100%|██████████| 500/500 [2:45:51<00:00, 19.90s/it]  
Increasing nodes batch size from 256 to 512
Increasing nodes batch size from 512 to 1024
Increasing nodes batch size from 1024 to 2048
Increasing nodes batch size from 2048 to 4096
>>> Last epoch: 0 : {'precision': 1.0, 'recall': 0.1444884735869769, 'f1': 0.25249441461674327, 'ari': 0, 'hq': 0, 'mq': 0, 'n_clusters': 22002, 'unresolved_mags': 0, 'epoch': 499} <<<
>>> best epoch: 0 : {'precision': 1.0, 'recall': 0.1444884735869769, 'f1': 0.25249441461674327, 'ari': 0, 'hq': 0, 'mq': 0, 'n_clusters': 22002, 'unresolved_mags': 0, 'epoch': 499} <<<
===================================================
writing features to results/graphmb_tmp/features.tsv
RUN 0
setting torch seed
setting tf seed
edges with overlapping scgs (max=20): []
deleted 0 edges with same SCGs
**** Num of edges: 239454
logging to mlflow
******* Running model: gcn **********
***** using edge weights: True ******
***** using disconnected: True ******
***** concat features: True *****
***** cluster markers only: False *****
***** threshold adj matrix: False *****
***** self edges only: False *****
***** Using raw kmer+abund features: False
***** SCG neg pairs: (0,)
***** input features dimension: 64
***** Nodes used for clustering: 224094
>>> Pre train stats: {'precision': 1.0, 'recall': 0.1509366605085366, 'f1': 0.262284912258935, 'ari': 0, 'hq': 0, 'mq': 0, 'n_clusters': 27254, 'unresolved_mags': 0}
*** Model input dim 64, GNN input dim 64
*** output clustering dim 32
[graphmb 3l] L=0.663 D=0.000  GPU=0.0MB: 100%|██████████| 500/500 [13:18<00:00,  1.60s/it]
>>> best epoch all contigs: 0 : {'precision': 1.0, 'recall': 0.046908886449436395, 'f1': 0.08961407636633333, 'ari': 0, 'hq': 0, 'mq': 0, 'n_clusters': 26887, 'unresolved_mags': 0, 'epoch': 499} <<<
>>> best epoch: 0 : {'precision': 1.0, 'recall': 0.046908886449436395, 'f1': 0.08961407636633333, 'ari': 0, 'hq': 0, 'mq': 0, 'n_clusters': 26887, 'unresolved_mags': 0, 'epoch': 499} <<<
### writing best and last embs to results/graphmb_tmp2
> /fs03/ie79/Zarul/status_nanopore/templates/test_envs/graphmb/.snakemake/conda/e578df177c8bc54b62a101e42de7ed70_/lib/python3.11/site-packages/graphmb/main.py(570)main()
-> if args.writebins:
(Pdb) 
Uncaught exception
Traceback (most recent call last):
  File "/home/mzar0002/miniconda3/envs/graphmb_/bin/graphmb", line 11, in <module>
    sys.exit(main())
             ^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/templates/test_envs/graphmb/.snakemake/conda/e578df177c8bc54b62a101e42de7ed70_/lib/python3.11/site-packages/graphmb/main.py", line 570, in main
    if args.writebins:
       ^^^^
  File "/fs03/ie79/Zarul/status_nanopore/templates/test_envs/graphmb/.snakemake/conda/e578df177c8bc54b62a101e42de7ed70_/lib/python3.11/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/templates/test_envs/graphmb/.snakemake/conda/e578df177c8bc54b62a101e42de7ed70_/lib/python3.11/bdb.py", line 115, in dispatch_line
    if self.quitting: raise BdbQuit
                      ^^^^^^^^^^^^^
bdb.BdbQuit

I notice I didnt input the marker_gene_stats.tsv, is that why?

AndreLamurias commented 3 months ago

Hi @ZarulHanifah it should not be that. Can you confirm if you have a "_best_embs.pickle" file in you output dir? Just to confirm where the error occurs.