a-h-b / binny

GNU General Public License v3.0
27 stars 6 forks source link

No bin file after analysis #45

Open ywangbioinfo opened 1 year ago

ywangbioinfo commented 1 year ago

I have a pair of seqs files with the size of 50 GB for each. After assembled with Meghit and aligned with bwa-mem2 followed by format conversion using samtools, two bam files and one assembly file were used to perform binning with Binny. Although the whole analysis has been finished, there is no bin file. There is no error message at any step. Could you examine what happened? Here is the log file for Binny (binning_binny.log):

25/01/2023 07:51:18 AM - Starting Binny run for sample meta. 25/01/2023 07:52:55 AM - Looking for single contig bins. 25/01/2023 07:55:10 AM - Found 0 single contig bins. 25/01/2023 07:55:10 AM - Calculating N90 25/01/2023 07:55:16 AM - N90 is 625, with scMAGs would be 625. 25/01/2023 07:55:47 AM - Masking potentially disruptive sequences from k-mer counting. 25/01/2023 07:55:52 AM - Calculating k-mer frequencies of sizes: 2, 3, 4. 25/01/2023 08:09:36 AM - K-mer frequency matrix created in 828s. 25/01/2023 08:11:35 AM - Using coassembly mode. 25/01/2023 08:14:10 AM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 08:14:19 AM - Running manifold learning and dimensionality-reduction. 25/01/2023 08:14:35 AM - Initializing embedding. 25/01/2023 08:19:48 AM - EE iteration 125 - KLD: 8.9383 25/01/2023 08:21:27 AM - EE iteration 250 - KLD: 8.9383, KLDRC: 0.0% 25/01/2023 08:23:06 AM - EE iteration 375 - KLD: 8.9315, KLDRC: 0.08% 25/01/2023 08:23:06 AM - Main iteration learning rate: 49663 25/01/2023 08:24:42 AM - Main iteration 125 - KLD: 5.7254, KLDRC: 35.9% 25/01/2023 08:26:27 AM - Main iteration 250 - KLD: 5.3895, KLDRC: 5.87% 25/01/2023 08:28:29 AM - Main iteration 375 - KLD: 5.24, KLDRC: 2.77% 25/01/2023 08:30:50 AM - Main iteration 500 - KLD: 5.1512, KLDRC: 1.69% 25/01/2023 08:33:25 AM - Main iteration 625 - KLD: 5.0907, KLDRC: 1.17% 25/01/2023 08:36:15 AM - Main iteration 750 - KLD: 5.0464, KLDRC: 0.87% 25/01/2023 08:36:15 AM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 5.0464 25/01/2023 08:36:16 AM - Reading annotation. 25/01/2023 08:36:57 AM - Reading embedding coordinates. 25/01/2023 08:36:57 AM - 2 samples for depth data found. 25/01/2023 08:37:07 AM - Clustering round 1. 25/01/2023 08:37:08 AM - Running initial clustering. 25/01/2023 08:39:34 AM - Final number of clusters: 7831. 25/01/2023 08:40:38 AM - Clustering round 2. 25/01/2023 08:40:38 AM - Running initial clustering. 25/01/2023 08:43:07 AM - Final number of clusters: 7403. 25/01/2023 08:43:56 AM - Clustering round 3. 25/01/2023 08:43:57 AM - Running initial clustering. 25/01/2023 08:46:30 AM - Final number of clusters: 5153. 25/01/2023 08:47:06 AM - Good bins this embedding iteration: 0. 25/01/2023 08:47:06 AM - Median of good bins per round < 1. Minimum completeness lowered to 82.5. 25/01/2023 08:47:11 AM - Total number of good bins: 0. 25/01/2023 08:47:18 AM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 08:47:27 AM - Running manifold learning and dimensionality-reduction. 25/01/2023 08:47:41 AM - Initializing embedding. 25/01/2023 08:53:14 AM - EE iteration 125 - KLD: 8.8424 25/01/2023 08:54:58 AM - EE iteration 250 - KLD: 8.8424, KLDRC: -0.0% 25/01/2023 08:56:50 AM - EE iteration 375 - KLD: 8.8367, KLDRC: 0.06% 25/01/2023 08:56:50 AM - Main iteration learning rate: 49663 25/01/2023 08:58:40 AM - Main iteration 125 - KLD: 5.7521, KLDRC: 34.91% 25/01/2023 09:00:29 AM - Main iteration 250 - KLD: 5.4274, KLDRC: 5.65% 25/01/2023 09:02:34 AM - Main iteration 375 - KLD: 5.284, KLDRC: 2.64% 25/01/2023 09:04:54 AM - Main iteration 500 - KLD: 5.2003, KLDRC: 1.58% 25/01/2023 09:07:31 AM - Main iteration 625 - KLD: 5.1439, KLDRC: 1.08% 25/01/2023 09:10:22 AM - Main iteration 750 - KLD: 5.1024, KLDRC: 0.81% 25/01/2023 09:10:22 AM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 5.1024 25/01/2023 09:10:22 AM - Reading annotation. 25/01/2023 09:10:59 AM - Reading embedding coordinates. 25/01/2023 09:10:59 AM - 2 samples for depth data found. 25/01/2023 09:11:08 AM - Clustering round 1. 25/01/2023 09:11:10 AM - Running initial clustering. 25/01/2023 09:13:39 AM - Final number of clusters: 13166. 25/01/2023 09:15:47 AM - Clustering round 2. 25/01/2023 09:15:47 AM - Running initial clustering. 25/01/2023 09:18:17 AM - Final number of clusters: 8127. 25/01/2023 09:19:12 AM - Clustering round 3. 25/01/2023 09:19:12 AM - Running initial clustering. 25/01/2023 09:21:46 AM - Final number of clusters: 5007. 25/01/2023 09:22:22 AM - Good bins this embedding iteration: 0. 25/01/2023 09:22:22 AM - Median of good bins per round < 1. Minimum completeness lowered to 72.5. 25/01/2023 09:22:31 AM - Total number of good bins: 0. 25/01/2023 09:22:38 AM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 09:22:46 AM - Running manifold learning and dimensionality-reduction. 25/01/2023 09:23:02 AM - Initializing embedding. 25/01/2023 09:28:52 AM - EE iteration 125 - KLD: 8.757 25/01/2023 09:30:54 AM - EE iteration 250 - KLD: 8.7571, KLDRC: -0.0% 25/01/2023 09:32:48 AM - EE iteration 375 - KLD: 8.7499, KLDRC: 0.08% 25/01/2023 09:32:48 AM - Main iteration learning rate: 49663 25/01/2023 09:34:39 AM - Main iteration 125 - KLD: 5.6486, KLDRC: 35.44% 25/01/2023 09:36:41 AM - Main iteration 250 - KLD: 5.3339, KLDRC: 5.57% 25/01/2023 09:39:03 AM - Main iteration 375 - KLD: 5.1949, KLDRC: 2.61% 25/01/2023 09:41:47 AM - Main iteration 500 - KLD: 5.1142, KLDRC: 1.55% 25/01/2023 09:44:44 AM - Main iteration 625 - KLD: 5.0598, KLDRC: 1.06% 25/01/2023 09:47:51 AM - Main iteration 750 - KLD: 5.021, KLDRC: 0.77% 25/01/2023 09:47:51 AM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 5.021 25/01/2023 09:47:52 AM - Reading annotation. 25/01/2023 09:48:32 AM - Reading embedding coordinates. 25/01/2023 09:48:32 AM - 2 samples for depth data found. 25/01/2023 09:48:42 AM - Clustering round 1. 25/01/2023 09:48:43 AM - Running initial clustering. 25/01/2023 09:51:10 AM - Final number of clusters: 13438. 25/01/2023 09:53:37 AM - Clustering round 2. 25/01/2023 09:53:37 AM - Running initial clustering. 25/01/2023 09:56:06 AM - Final number of clusters: 7898. 25/01/2023 09:57:17 AM - Clustering round 3. 25/01/2023 09:57:17 AM - Running initial clustering. 25/01/2023 09:59:53 AM - Final number of clusters: 4892. 25/01/2023 10:00:46 AM - Good bins this embedding iteration: 0. 25/01/2023 10:00:46 AM - Median of good bins per round < 1. Minimum completeness lowered to 62.5. 25/01/2023 10:00:55 AM - Total number of good bins: 0. 25/01/2023 10:01:01 AM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 10:01:10 AM - Running manifold learning and dimensionality-reduction. 25/01/2023 10:01:25 AM - Initializing embedding. 25/01/2023 10:07:18 AM - EE iteration 125 - KLD: 8.6799 25/01/2023 10:09:10 AM - EE iteration 250 - KLD: 8.6799, KLDRC: 0.0% 25/01/2023 10:11:08 AM - EE iteration 375 - KLD: 8.6737, KLDRC: 0.07% 25/01/2023 10:11:08 AM - Main iteration learning rate: 49663 25/01/2023 10:13:02 AM - Main iteration 125 - KLD: 5.5787, KLDRC: 35.68% 25/01/2023 10:15:05 AM - Main iteration 250 - KLD: 5.2789, KLDRC: 5.37% 25/01/2023 10:17:27 AM - Main iteration 375 - KLD: 5.1473, KLDRC: 2.49% 25/01/2023 10:20:04 AM - Main iteration 500 - KLD: 5.0698, KLDRC: 1.5% 25/01/2023 10:22:57 AM - Main iteration 625 - KLD: 5.0186, KLDRC: 1.01% 25/01/2023 10:26:04 AM - Main iteration 750 - KLD: 4.9818, KLDRC: 0.73% 25/01/2023 10:26:04 AM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 4.9818 25/01/2023 10:26:04 AM - Reading annotation. 25/01/2023 10:26:46 AM - Reading embedding coordinates. 25/01/2023 10:26:46 AM - 2 samples for depth data found. 25/01/2023 10:26:55 AM - Clustering round 1. 25/01/2023 10:26:56 AM - Running initial clustering. 25/01/2023 10:29:23 AM - Final number of clusters: 5568. 25/01/2023 10:30:33 AM - Clustering round 2. 25/01/2023 10:30:33 AM - Running initial clustering. 25/01/2023 10:33:03 AM - Final number of clusters: 6102. 25/01/2023 10:34:03 AM - Clustering round 3. 25/01/2023 10:34:03 AM - Running initial clustering. 25/01/2023 10:36:39 AM - Final number of clusters: 4709. 25/01/2023 10:37:31 AM - Good bins this embedding iteration: 0. 25/01/2023 10:37:31 AM - Median of good bins per round < 1. Minimum completeness lowered to 52.5. 25/01/2023 10:37:36 AM - Total number of good bins: 0. 25/01/2023 10:37:41 AM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 10:37:50 AM - Running manifold learning and dimensionality-reduction. 25/01/2023 10:38:05 AM - Initializing embedding. 25/01/2023 10:44:17 AM - EE iteration 125 - KLD: 8.6094 25/01/2023 10:46:17 AM - EE iteration 250 - KLD: 8.6095, KLDRC: -0.0% 25/01/2023 10:48:16 AM - EE iteration 375 - KLD: 8.6006, KLDRC: 0.1% 25/01/2023 10:48:16 AM - Main iteration learning rate: 49663 25/01/2023 10:50:13 AM - Main iteration 125 - KLD: 5.5586, KLDRC: 35.37% 25/01/2023 10:52:17 AM - Main iteration 250 - KLD: 5.2591, KLDRC: 5.39% 25/01/2023 10:54:27 AM - Main iteration 375 - KLD: 5.1293, KLDRC: 2.47% 25/01/2023 10:56:47 AM - Main iteration 500 - KLD: 5.0544, KLDRC: 1.46% 25/01/2023 10:59:24 AM - Main iteration 625 - KLD: 5.0088, KLDRC: 0.9% 25/01/2023 10:59:24 AM - Finished dimensionality-reduction in 1000 iterations. Final KLD: 5.0088 25/01/2023 10:59:24 AM - Reading annotation. 25/01/2023 11:00:01 AM - Reading embedding coordinates. 25/01/2023 11:00:01 AM - 2 samples for depth data found. 25/01/2023 11:00:10 AM - Clustering round 1. 25/01/2023 11:00:11 AM - Running initial clustering. 25/01/2023 11:02:39 AM - Final number of clusters: 12388. 25/01/2023 11:04:55 AM - Clustering round 2. 25/01/2023 11:04:56 AM - Running initial clustering. 25/01/2023 11:07:27 AM - Final number of clusters: 7438. 25/01/2023 11:08:35 AM - Clustering round 3. 25/01/2023 11:08:35 AM - Running initial clustering. 25/01/2023 11:11:11 AM - Final number of clusters: 4213. 25/01/2023 11:12:01 AM - Good bins this embedding iteration: 0. 25/01/2023 11:12:01 AM - Median of good bins per round < 1. Minimum completeness lowered to 42.5. 25/01/2023 11:12:11 AM - Total number of good bins: 0. 25/01/2023 11:12:17 AM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 11:12:25 AM - Running manifold learning and dimensionality-reduction. 25/01/2023 11:12:40 AM - Initializing embedding. 25/01/2023 11:19:03 AM - EE iteration 125 - KLD: 8.5444 25/01/2023 11:20:48 AM - EE iteration 250 - KLD: 8.5444, KLDRC: 0.0% 25/01/2023 11:22:33 AM - EE iteration 375 - KLD: 8.5383, KLDRC: 0.07% 25/01/2023 11:22:33 AM - Main iteration learning rate: 49663 25/01/2023 11:24:14 AM - Main iteration 125 - KLD: 5.4883, KLDRC: 35.72% 25/01/2023 11:26:05 AM - Main iteration 250 - KLD: 5.2019, KLDRC: 5.22% 25/01/2023 11:28:12 AM - Main iteration 375 - KLD: 5.0769, KLDRC: 2.4% 25/01/2023 11:30:38 AM - Main iteration 500 - KLD: 5.0044, KLDRC: 1.43% 25/01/2023 11:33:23 AM - Main iteration 625 - KLD: 4.9561, KLDRC: 0.97% 25/01/2023 11:33:23 AM - Finished dimensionality-reduction in 1000 iterations. Final KLD: 4.9561 25/01/2023 11:33:23 AM - Reading annotation. 25/01/2023 11:34:06 AM - Reading embedding coordinates. 25/01/2023 11:34:06 AM - 2 samples for depth data found. 25/01/2023 11:34:15 AM - Clustering round 1. 25/01/2023 11:34:17 AM - Running initial clustering. 25/01/2023 11:36:45 AM - Final number of clusters: 12880. 25/01/2023 11:39:07 AM - Clustering round 2. 25/01/2023 11:39:07 AM - Running initial clustering. 25/01/2023 11:41:39 AM - Final number of clusters: 7183. 25/01/2023 11:42:45 AM - Clustering round 3. 25/01/2023 11:42:45 AM - Running initial clustering. 25/01/2023 11:45:21 AM - Final number of clusters: 4399. 25/01/2023 11:46:10 AM - Good bins this embedding iteration: 0. 25/01/2023 11:46:10 AM - Median of good bins per round < 1. Minimum completeness lowered to 32.5. 25/01/2023 11:46:15 AM - Total number of good bins: 0. 25/01/2023 11:46:21 AM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 11:46:29 AM - Running manifold learning and dimensionality-reduction. 25/01/2023 11:46:43 AM - Initializing embedding. 25/01/2023 11:53:17 AM - EE iteration 125 - KLD: 8.7813 25/01/2023 11:55:24 AM - EE iteration 250 - KLD: 8.7813, KLDRC: 0.0% 25/01/2023 11:57:41 AM - EE iteration 375 - KLD: 8.7775, KLDRC: 0.04% 25/01/2023 11:57:41 AM - Main iteration learning rate: 49663 25/01/2023 11:59:58 AM - Main iteration 125 - KLD: 5.6973, KLDRC: 35.09% 25/01/2023 12:02:12 PM - Main iteration 250 - KLD: 5.3711, KLDRC: 5.73% 25/01/2023 12:04:42 PM - Main iteration 375 - KLD: 5.226, KLDRC: 2.7% 25/01/2023 12:07:30 PM - Main iteration 500 - KLD: 5.1405, KLDRC: 1.63% 25/01/2023 12:10:36 PM - Main iteration 625 - KLD: 5.083, KLDRC: 1.12% 25/01/2023 12:14:11 PM - Main iteration 750 - KLD: 5.04, KLDRC: 0.85% 25/01/2023 12:14:11 PM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 5.04 25/01/2023 12:14:11 PM - Reading annotation. 25/01/2023 12:14:48 PM - Reading embedding coordinates. 25/01/2023 12:14:48 PM - 2 samples for depth data found. 25/01/2023 12:14:57 PM - Clustering round 1. 25/01/2023 12:14:58 PM - Running initial clustering. 25/01/2023 12:17:26 PM - Final number of clusters: 6543. 25/01/2023 12:18:47 PM - Clustering round 2. 25/01/2023 12:18:47 PM - Running initial clustering. 25/01/2023 12:21:18 PM - Final number of clusters: 6893. 25/01/2023 12:22:22 PM - Clustering round 3. 25/01/2023 12:22:22 PM - Running initial clustering. 25/01/2023 12:24:56 PM - Final number of clusters: 5129. 25/01/2023 12:25:50 PM - Good bins this embedding iteration: 0. 25/01/2023 12:25:50 PM - Median of good bins per round < 1. Minimum completeness lowered to 22.5. 25/01/2023 12:26:00 PM - Total number of good bins: 0. 25/01/2023 12:26:06 PM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 12:26:14 PM - Running manifold learning and dimensionality-reduction. 25/01/2023 12:26:29 PM - Initializing embedding. 25/01/2023 12:31:51 PM - EE iteration 125 - KLD: 8.8715 25/01/2023 12:33:22 PM - EE iteration 250 - KLD: 8.8715, KLDRC: 0.0% 25/01/2023 12:34:53 PM - EE iteration 375 - KLD: 8.8651, KLDRC: 0.07% 25/01/2023 12:34:53 PM - Main iteration learning rate: 49663 25/01/2023 12:36:21 PM - Main iteration 125 - KLD: 5.7294, KLDRC: 35.37% 25/01/2023 12:37:58 PM - Main iteration 250 - KLD: 5.4051, KLDRC: 5.66% 25/01/2023 12:39:51 PM - Main iteration 375 - KLD: 5.2607, KLDRC: 2.67% 25/01/2023 12:42:04 PM - Main iteration 500 - KLD: 5.1757, KLDRC: 1.62% 25/01/2023 12:44:33 PM - Main iteration 625 - KLD: 5.1181, KLDRC: 1.11% 25/01/2023 12:47:16 PM - Main iteration 750 - KLD: 5.0762, KLDRC: 0.82% 25/01/2023 12:47:16 PM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 5.0762 25/01/2023 12:47:16 PM - Reading annotation. 25/01/2023 12:47:54 PM - Reading embedding coordinates. 25/01/2023 12:47:54 PM - 2 samples for depth data found. 25/01/2023 12:48:03 PM - Clustering round 1. 25/01/2023 12:48:04 PM - Running initial clustering. 25/01/2023 12:50:39 PM - Final number of clusters: 12967. 25/01/2023 12:53:02 PM - Clustering round 2. 25/01/2023 12:53:02 PM - Running initial clustering. 25/01/2023 12:55:33 PM - Final number of clusters: 8093. 25/01/2023 12:56:45 PM - Clustering round 3. 25/01/2023 12:56:45 PM - Running initial clustering. 25/01/2023 12:59:20 PM - Final number of clusters: 5078. 25/01/2023 01:00:17 PM - Good bins this embedding iteration: 0. 25/01/2023 01:00:17 PM - Median of good bins per round < 1. Minimum completeness lowered to 12.5. 25/01/2023 01:00:22 PM - Total number of good bins: 0. 25/01/2023 01:00:29 PM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 01:00:37 PM - Running manifold learning and dimensionality-reduction. 25/01/2023 01:00:52 PM - Initializing embedding. 25/01/2023 01:06:18 PM - EE iteration 125 - KLD: 8.7847 25/01/2023 01:07:53 PM - EE iteration 250 - KLD: 8.7847, KLDRC: -0.0% 25/01/2023 01:09:28 PM - EE iteration 375 - KLD: 8.7774, KLDRC: 0.08% 25/01/2023 01:09:28 PM - Main iteration learning rate: 49663 25/01/2023 01:11:00 PM - Main iteration 125 - KLD: 5.6284, KLDRC: 35.88% 25/01/2023 01:12:45 PM - Main iteration 250 - KLD: 5.3146, KLDRC: 5.58% 25/01/2023 01:14:46 PM - Main iteration 375 - KLD: 5.1758, KLDRC: 2.61% 25/01/2023 01:17:17 PM - Main iteration 500 - KLD: 5.0945, KLDRC: 1.57% 25/01/2023 01:20:07 PM - Main iteration 625 - KLD: 5.04, KLDRC: 1.07% 25/01/2023 01:23:11 PM - Main iteration 750 - KLD: 4.9997, KLDRC: 0.8% 25/01/2023 01:23:11 PM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 4.9997 25/01/2023 01:23:11 PM - Reading annotation. 25/01/2023 01:23:54 PM - Reading embedding coordinates. 25/01/2023 01:23:54 PM - 2 samples for depth data found. 25/01/2023 01:24:03 PM - Clustering round 1. 25/01/2023 01:24:04 PM - Running initial clustering. 25/01/2023 01:26:33 PM - Final number of clusters: 13361. 25/01/2023 01:29:02 PM - Clustering round 2. 25/01/2023 01:29:02 PM - Running initial clustering. 25/01/2023 01:31:31 PM - Final number of clusters: 7791. 25/01/2023 01:32:43 PM - Clustering round 3. 25/01/2023 01:32:43 PM - Running initial clustering. 25/01/2023 01:35:14 PM - Final number of clusters: 4877. 25/01/2023 01:36:06 PM - Good bins this embedding iteration: 0. 25/01/2023 01:36:06 PM - Median of good bins per round < 1. Minimum completeness lowered to 2.5. 25/01/2023 01:36:16 PM - Total number of good bins: 0. 25/01/2023 01:36:22 PM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 01:36:30 PM - Running manifold learning and dimensionality-reduction. 25/01/2023 01:36:45 PM - Initializing embedding. 25/01/2023 01:42:26 PM - EE iteration 125 - KLD: 8.7062 25/01/2023 01:44:02 PM - EE iteration 250 - KLD: 8.7062, KLDRC: -0.0% 25/01/2023 01:45:39 PM - EE iteration 375 - KLD: 8.701, KLDRC: 0.06% 25/01/2023 01:45:39 PM - Main iteration learning rate: 49663 25/01/2023 01:47:13 PM - Main iteration 125 - KLD: 5.6258, KLDRC: 35.34% 25/01/2023 01:48:56 PM - Main iteration 250 - KLD: 5.3129, KLDRC: 5.56% 25/01/2023 01:50:56 PM - Main iteration 375 - KLD: 5.177, KLDRC: 2.56% 25/01/2023 01:53:14 PM - Main iteration 500 - KLD: 5.0981, KLDRC: 1.53% 25/01/2023 01:55:50 PM - Main iteration 625 - KLD: 5.0455, KLDRC: 1.03% 25/01/2023 01:58:40 PM - Main iteration 750 - KLD: 5.0073, KLDRC: 0.76% 25/01/2023 01:58:40 PM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 5.0073 25/01/2023 01:58:41 PM - Reading annotation. 25/01/2023 01:59:23 PM - Reading embedding coordinates. 25/01/2023 01:59:23 PM - 2 samples for depth data found. 25/01/2023 01:59:32 PM - Clustering round 1. 25/01/2023 01:59:34 PM - Running initial clustering. 25/01/2023 02:02:01 PM - Final number of clusters: 5606. 25/01/2023 02:03:10 PM - Clustering round 2. 25/01/2023 02:03:10 PM - Running initial clustering. 25/01/2023 02:05:39 PM - Final number of clusters: 6162. 25/01/2023 02:06:39 PM - Clustering round 3. 25/01/2023 02:06:39 PM - Running initial clustering. 25/01/2023 02:09:13 PM - Final number of clusters: 4767. 25/01/2023 02:10:05 PM - Good bins this embedding iteration: 0. 25/01/2023 02:10:05 PM - Running with contigs >= 2000bp, minimum completeness 2.5. 25/01/2023 02:10:15 PM - Total number of good bins: 0. 25/01/2023 02:10:20 PM - Running with 177990 contigs. Filtered 1300056 contigs using a min contig size of 2000 to stay below 500000.0 contigs 25/01/2023 02:10:23 PM - Running manifold learning and dimensionality-reduction. 25/01/2023 02:10:29 PM - Initializing embedding. 25/01/2023 02:12:19 PM - EE iteration 125 - KLD: 7.6416 25/01/2023 02:12:57 PM - EE iteration 250 - KLD: 7.3435, KLDRC: 3.9% 25/01/2023 02:13:41 PM - EE iteration 375 - KLD: 7.0224, KLDRC: 4.37% 25/01/2023 02:14:25 PM - EE iteration 500 - KLD: 6.9866, KLDRC: 0.51% 25/01/2023 02:14:25 PM - Main iteration learning rate: 17799 25/01/2023 02:15:06 PM - Main iteration 125 - KLD: 4.0918, KLDRC: 41.43% 25/01/2023 02:15:52 PM - Main iteration 250 - KLD: 3.7975, KLDRC: 7.19% 25/01/2023 02:16:51 PM - Main iteration 375 - KLD: 3.6757, KLDRC: 3.21% 25/01/2023 02:18:02 PM - Main iteration 500 - KLD: 3.6058, KLDRC: 1.9% 25/01/2023 02:19:27 PM - Main iteration 625 - KLD: 3.5602, KLDRC: 1.26% 25/01/2023 02:21:05 PM - Main iteration 750 - KLD: 3.5284, KLDRC: 0.89% 25/01/2023 02:21:05 PM - Finished dimensionality-reduction in 1250 iterations. Final KLD: 3.5284 25/01/2023 02:21:05 PM - Reading annotation. 25/01/2023 02:21:42 PM - Reading embedding coordinates. 25/01/2023 02:21:42 PM - 2 samples for depth data found. 25/01/2023 02:21:50 PM - Clustering round 1. 25/01/2023 02:21:50 PM - Running initial clustering. 25/01/2023 02:22:31 PM - Final number of clusters: 4155. 25/01/2023 02:23:28 PM - Clustering round 2. 25/01/2023 02:23:28 PM - Running initial clustering. 25/01/2023 02:24:08 PM - Final number of clusters: 1904. 25/01/2023 02:24:53 PM - Clustering round 3. 25/01/2023 02:24:53 PM - Running initial clustering. 25/01/2023 02:25:34 PM - Final number of clusters: 1079. 25/01/2023 02:26:13 PM - Good bins this embedding iteration: 0. 25/01/2023 02:26:13 PM - Running with contigs >= 1500bp, minimum completeness 2.5. 25/01/2023 02:26:15 PM - Total number of good bins: 0. 25/01/2023 02:26:21 PM - Running with 274788 contigs. Filtered 1203258 contigs using a min contig size of 1500 to stay below 500000.0 contigs 25/01/2023 02:26:25 PM - Running manifold learning and dimensionality-reduction. 25/01/2023 02:26:33 PM - Initializing embedding. 25/01/2023 02:29:39 PM - EE iteration 125 - KLD: 7.9897 25/01/2023 02:30:39 PM - EE iteration 250 - KLD: 7.9878, KLDRC: 0.02% 25/01/2023 02:31:39 PM - EE iteration 375 - KLD: 7.6459, KLDRC: 4.28% 25/01/2023 02:32:39 PM - EE iteration 500 - KLD: 7.5709, KLDRC: 0.98% 25/01/2023 02:32:39 PM - Main iteration learning rate: 27478 25/01/2023 02:33:37 PM - Main iteration 125 - KLD: 4.7666, KLDRC: 37.04% 25/01/2023 02:34:45 PM - Main iteration 250 - KLD: 4.4635, KLDRC: 6.36% 25/01/2023 02:36:08 PM - Main iteration 375 - KLD: 4.3359, KLDRC: 2.86% 25/01/2023 02:37:44 PM - Main iteration 500 - KLD: 4.2633, KLDRC: 1.67% 25/01/2023 02:39:35 PM - Main iteration 625 - KLD: 4.2154, KLDRC: 1.12% 25/01/2023 02:41:37 PM - Main iteration 750 - KLD: 4.1811, KLDRC: 0.81% 25/01/2023 02:41:37 PM - Finished dimensionality-reduction in 1250 iterations. Final KLD: 4.1811 25/01/2023 02:41:37 PM - Reading annotation. 25/01/2023 02:42:20 PM - Reading embedding coordinates. 25/01/2023 02:42:20 PM - 2 samples for depth data found. 25/01/2023 02:42:29 PM - Clustering round 1. 25/01/2023 02:42:30 PM - Running initial clustering. 25/01/2023 02:43:52 PM - Final number of clusters: 7138. 25/01/2023 02:45:06 PM - Clustering round 2. 25/01/2023 02:45:06 PM - Running initial clustering. 25/01/2023 02:46:30 PM - Final number of clusters: 3784. 25/01/2023 02:47:16 PM - Clustering round 3. 25/01/2023 02:47:16 PM - Running initial clustering. 25/01/2023 02:48:42 PM - Final number of clusters: 2371. 25/01/2023 02:49:23 PM - Good bins this embedding iteration: 0. 25/01/2023 02:49:23 PM - Running with contigs >= 1000bp, minimum completeness 2.5. 25/01/2023 02:49:26 PM - Total number of good bins: 0. 25/01/2023 02:49:36 PM - Running with 496638 contigs. Filtered 981408 contigs using a min contig size of 1015 to stay below 500000.0 contigs 25/01/2023 02:49:44 PM - Running manifold learning and dimensionality-reduction. 25/01/2023 02:49:58 PM - Initializing embedding. 25/01/2023 02:56:10 PM - EE iteration 125 - KLD: 8.8047 25/01/2023 02:58:11 PM - EE iteration 250 - KLD: 8.8048, KLDRC: -0.0% 25/01/2023 03:00:13 PM - EE iteration 375 - KLD: 8.8013, KLDRC: 0.04% 25/01/2023 03:00:13 PM - Main iteration learning rate: 49663 25/01/2023 03:02:15 PM - Main iteration 125 - KLD: 5.6923, KLDRC: 35.32% 25/01/2023 03:04:23 PM - Main iteration 250 - KLD: 5.361, KLDRC: 5.82% 25/01/2023 03:06:44 PM - Main iteration 375 - KLD: 5.2118, KLDRC: 2.78% 25/01/2023 03:09:24 PM - Main iteration 500 - KLD: 5.1243, KLDRC: 1.68% 25/01/2023 03:12:20 PM - Main iteration 625 - KLD: 5.0655, KLDRC: 1.15% 25/01/2023 03:15:31 PM - Main iteration 750 - KLD: 5.0225, KLDRC: 0.85% 25/01/2023 03:15:31 PM - Finished dimensionality-reduction in 1125 iterations. Final KLD: 5.0225 25/01/2023 03:15:31 PM - Reading annotation. 25/01/2023 03:16:13 PM - Reading embedding coordinates. 25/01/2023 03:16:13 PM - 2 samples for depth data found. 25/01/2023 03:16:23 PM - Clustering round 1. 25/01/2023 03:16:25 PM - Running initial clustering. 25/01/2023 03:18:51 PM - Final number of clusters: 7001. 25/01/2023 03:20:09 PM - Clustering round 2. 25/01/2023 03:20:09 PM - Running initial clustering. 25/01/2023 03:22:40 PM - Final number of clusters: 7136. 25/01/2023 03:23:46 PM - Clustering round 3. 25/01/2023 03:23:46 PM - Running initial clustering. 25/01/2023 03:26:20 PM - Final number of clusters: 5180. 25/01/2023 03:27:20 PM - Good bins this embedding iteration: 0. 25/01/2023 03:27:20 PM - Reached min completeness and min contig size. Exiting embedding iteration 25/01/2023 03:34:32 PM - Writing contig data to file. 25/01/2023 03:49:21 PM - Run finished.

ohickl commented 1 year ago

Hi, It looks like binny did not find any bins satisfying the quality parameters. Could you also post your config file and maybe the first few lines of path/to/binny/output/intermediary/assembly.contig_depth.txt?

Best Oskar

ywangbioinfo commented 1 year ago

Thank you for your quick response. I am using a desktop computer with 24 cores and 128 GB ram. Here are the command to run binny, the first few lines of the depth file, and the config file.

command: ./binny -l -r -t 24 config/config.work.yaml

The first few lines of depth file: k141_1564541 flag=1 multi=2.0000 len=514 0.00000000 0.00000000 k141_1280081 flag=1 multi=2.0000 len=573 0.00000000 0.00000000 k141_2346806 flag=1 multi=2.0000 len=508 0.00000000 0.00000000 k141_2204576 flag=1 multi=2.0000 len=654 0.00000000 0.00000000 k141_1493426 flag=1 multi=3.0000 len=715 0.00000000 0.00000000 k141_213348 flag=1 multi=2.0000 len=515 0.00000000 0.00000000 k141_426696 flag=1 multi=2.0000 len=513 0.00000000 0.00000000 k141_1991231 flag=1 multi=2.0000 len=506 0.00000000 0.00000000 k141_2560151 flag=0 multi=5.0000 len=827 0.00000000 0.00000000 k141_711162 flag=1 multi=3.0000 len=512 0.00000000 0.00000000

The config file: mem:

If your HPC resource has a high memory capacity node you can set this to

TRUE and specify the amount of memory per core (e.g. if a node has 260 gb of

RAM and 10 cores it would be 26).

big_mem_avail: 0 big_mem_per_core_gb: 26

Memory per core of your computing resource.

normal_mem_per_core_gb: 4

Path to a temporary directory to write to.

tmp_dir: tmp raws:

Path to an assembly fasta.

assembly: "meta/assembly.fasta"

Path to a bam file(s) to calculate depth from. Use wildcards for multiple samples, e.g.:

"path/to/my/mappings/.bam" or "path/to/my/mappings/with//different/folder//structure/.bam" or "path/to/my/mappings/my_mapping.bam"

Leave empty if you have an average depth per contig file to supply to binny.

metagenomicsalignment: "meta/align*.bam"

Path to an average depth per contig tsv file. Leave empty if you supply a

bam file for binny to calculate average contig depth from. First column needs to be the contig ids, subsequent column(s) for depht(s).

contig_depth: ""

Sample name

sample: "meta"

Path to desired output dir binny should create and store results in.

outputdir: "meta_output"

Absolute path to put binny dbs in. If left empty they will be put into 'database' in the binny main dir.

db_path: ""

If you want to use existing environments containing the environments for Snakemake, Prokka

and/or Mantis you can either input the absolute paths to yaml files or env names here.

Otherwise, leave empty and binny will take care of the installations.

If you already have Snakemake in your path set: snakemake_env="in_path".

snakemake_env: "" prokka_env: "" mantis_env: ""

Set path for conda envs to be installed to. By default, they will be put in conda in the binny dir.

conda_source: ""

Input a list, e.g. '2,3,4'.

kmers: '2,3,4'

Mask potentially disruptive contig regions (e.g. rRNA and CRISPR elements) from k-mer counting

mask_disruptive_sequences: 'True'

Extract single contig MAGs of at least 90% purity and 92.5% completeness

extract_scmags: 'True'

Will use coassembly mode, starting with contigs >= 500 bp instead of high threshold, decreasing, if set to 'on' or

if 'auto' and multiple depth files are detected

Choose between: 'auto', 'on' , 'off''

coassembly_mode: 'auto'

Binny prefilters assemblies based on N value to try and take as much information as possible into account,

while minimizing the amount of noise. Be aware that, depending on the assembly quality, low values as the N

might results in leaving out a large proportion of the assembly (if the max_cont_length cutoffs are set high as well).

NX_value: 90

Minimum and maximum contig length. Caps value from NX filtering.

min_cont_length_cutoff: 2250 # 500 max_cont_length_cutoff: 2250 # 1000

Minimum and maximum length of contigs containing CheckM markers. Caps value from NX filtering (initial value is NX_value / 2).

min_cont_length_cutoff_marker: 2250 # 250 max_cont_length_cutoff_marker: 2250 # 400

Maximum number of contigs binny uses. If the number of available

contigs after minimum size filtering exceeds this, binny will

increase the minimum size threshold until the maximum is reached.

Prevents use of excessive amounts of memory on large assemblies.

Default should ensure adequate performance, adjust e.g. according

to available memory.

max_n_contigs: 5.0e5 # 3.5e5

Maximum marker set lineage depth to check bin quality with:

0: 'domain', 1: 'phylum', 2: 'class', 3: 'order', 4: 'family', 5: 'genus', 6: 'species'

max_marker_lineage_depth_lvl: 2

Distance metric for opentSNE and HDBSCAN.

distance_metric: 'manhattan' embedding:

Maximum number of binny iterations.

max_iterations: 50 clustering:

Increasing the HDBSCAN cluster selection epsilon beyond 0.5

is not advised as it might massively increase run time, but it might

help recover fragmented genomes that would be missed with lower settings.

hdbscan_epsilon_range: '0.250,0.000'

Adapted from the HDBSCAN manual: 'Measure of how conservative the

clustering should be. With larger values, more points will be declared

as noise, and clusters will be restricted to progressively more dense areas.'.

hdbscan_min_samples_range: '1,5,10' # '1,2,4,6,8'

Use depth as additional dimension during the initial clustering.

include_depth_initial: 'False'

Use depth as additional dimension during the main clusterings.

include_depth_main: 'False' bin_quality:

Minimum value binny will lower completeness to while running.

min_completeness: 72.5

Completeness threshold binny wilt begin with.

start_completeness: 92.5

Minimum purity for bins to be selected.

purity: 95 write_contig_data: 'True'

ywangbioinfo commented 1 year ago

Last month, I ever succeeded in a test analysis using Binny with an assembly file generated from MetaSPAdes. This time, the assembly file was generated from Megahit. I noticed that the format of the sequence title line is different between the files generated from MetaSPAdes and Megahit as shown below. Probably, the critical difference is whether there is space in the title line or not.

The first few lines of the assembly file generated by Megahit:

'>k127_107100 flag=0 multi=1.0000 len=208 CGGGCCTGGTGTTGCTGCGCGCGATACCGCCGCTTCTGGCGCACACCGATCCGAAACCACTGCTGCGGGATCTTGCGCGCAGGACGGCCGATGGCGACAGCCGACGCCATTTCGACGAGACGCTCGACGCCCAGCAGCGCATTCTCGCCGACATGGCTTGCCGTGCGGCGATCCGCGCCAACCGCCGCCTGAGTCTGCCGGAGATGGA' '>k127_164220 flag=1 multi=2.0000 len=333 CCGCCGAGCCGCCACATCAGCTGACGCAACCACCGCAGGTCGGCCGTCCGGAAGGTGACCGTCGCCGAGCCGTCGTCGCGGCGCTGCACGCCGCTGACCGGGTAGTAGTCCACGACCCAGGCCGCGCCGGACCGCAGCTCCACGACCGCGGTCAGGTCATCCGGTCCGGGGGTGAAGAGCCCGGAGTCGAGGTCGCGAGGGCGCGCCTCGCTCGGCGGGGTGCCGTCCGCGTCGAGGATCTGCACCGACGAGACGCGATCCAGCCGGAAGAGCCGCACGTCGTGCGCCCGGTGGCACCAGCCCTCGAGATACCAGTTGCCGTCGAGGGAGACG'

The first few lines of the assembly file generated by MetaSPAdes:

'>NODE_1_length_1099842_cov_27.453087 TGAACGACCCGAGCGGACGGCCGAGCCAGATTCTGGGGAACGTGACCGCGCAGGGCGCGG TGTACGTGATCAACCGCAACGGCGTGCTGTTCGGCGCGGGGTCGCAGGTCAATGTGCATT CGCTGGTTGCGTCGTCGCTCGATCTGCTGAACACGAACAATCATCTGGTGCAGACACCGG ATGACGTGGTCGCGAGCAACAAGCTGTTCCTGAGCATGCCGGCCGGCACGACGGGCGGGT TGGCATATCCGGAATCGGGCAACTCGAACGTCGGCGGCGTGCAAGGTGTGACCGGGGTCC CGAACGAAGTGCTGGGCCTTGGCAACCAGGCATCGGTCAGCGCGTCGAACCCGTACTTGA TACCGGGCGACGTCACGATCGAACCGGGTGCGTCGATCGCGACGCACACCACCGGTACGG TCAGCGACGGCGGTTTCGTGCTGGTCGCCGCGCCGAACGTAACGAACGGCGGCAGCATCA CGGCGACGGCCGGGCAGGTCGTGCTGGCCGCCGGCGTCGGCGTCAGCCTGAGACCGAATC'

I guess that Binny only recognizes the assembly file generated by MetaSPAdes. So, I got trouble with an assembly file generated by Megahit.

I will prepare a new assembly file using MetaSPAdes, then analyze it with Binny.

Since Megahit is a popular soft for metagenome sequence assembly, I hope that Binny could recognize the file generated by Megahit after a new version is released.

ohickl commented 1 year ago

Sorry for the delay, missed you comments. I think the problem are the spaces in the contig headers. Do you use the MetaWrap pipeline for you assemblies? Having non-standard characters in the header often creates problems. I would try to replace them with e.g. _'s. Might also need to change the bam files/ depth tables as well.

ywangbioinfo commented 1 year ago

Thank you for your response. I did not use MetaWrap pipeline. I used fastp for quality control, and Megahit or MetaSpades as assembler, then conduct binning with Binny. I will try to replace any non-standard characters with "_", and try again.