Open Bioinformations opened 1 year ago
Hi,
You need to check different parameters.
Originally I used;
python mbcclr --resume \
--reads-path nanoflit.fasta \
--bin-size 32 \
--bin-count 32 \
--output test_output \
--embedding tsne \
--k-size \
--threads 8
Note - tsne works okay. UMAP sometimes behave differently. Since this is developed for raw reads I'd start with a smaller k-mer like 3
or 4
to see how it performs.
Change -c
to something like 10000
, 20000
or 50000
.
Let me know if this worked.
Also checkout my other long read binning tools like LRBinner
and OBLR
Thanks for your response! I've tried several combinations of parameters,but just can find less bins ,sometimes 3 bins ,and 2 bins. And i find that every bin which have been separated is too big , almost 5GB. And i tried use the LRBinner , and the parameters as follows : python lrbinner.py reads -r ../out_dir/0.2sample.fasta -bc 10 -bs 32 -o lrb_output --resume -mbs 5000 --ae-dims 4 --ae-epochs 200 -bit 0 -t 100 When finish this program, just can find 3 bins , and the bin is big ,too . may i need to attempt more different parameters about the --bin count and --bin size ?
It seem like a complex dataset
try to increase -bit to 50, also increase -bc and -bs and see.
how many bins do you expect to find? Did you try OBLR?
Yeah , I think it"s really a complex dataset. I want have more than 40 bins from this. Cuz I have tried using the illumina sequencing data for the same sample , its can be bined almost 100 bins . I" m trying to set large parameters , also wiil try OBLR, too.
Hello When i use the this command :"python mbcclr --resume -r nanoflit.fasta -o test_output -e umap -c 613187 -k 5 -t 100" , and the finnal.txt just can find one Bin-1. I am not sure that some parameters are set correctly.
The operation of the software are as follows: 2023-02-22 09:16:59,945 - INFO - Command mbcclr --resume -r PGA-nanoflit.fasta -o test_output2 -e umap -c 613187 -k 5 -t 100 2023-02-22 09:16:59,945 - INFO - Resuming the program from previous checkpoints 2023-02-22 09:16:59,945 - INFO - Counting K-mers INPUT FILE PGA-nanoflit.fasta OUTPUT FILE test_output2/profiles/3mers K_SIZE 5 THREADS 100 Profile Size 512 Total 5-mers 1024 Loaded Reads 6131871 2023-02-22 09:40:24,180 - INFO - Counting K-mers complete 2023-02-22 09:40:24,181 - INFO - Counting 15-mers INPUT FILE PGA-nanoflit.fasta OUTPUT FILE test_output2/profiles/15mers-counts THREADS 100 Loaded Reads 6131871 WRITING TO FILE COMPLETED : Output at - test_output2/profiles/15mers-counts 2023-02-22 09:48:27,783 - INFO - Counting 15-mers complete 2023-02-22 09:48:27,784 - INFO - Generating 15-mer profiles K-Mer file test_output2/profiles/15mers-counts LOADING KMERS TO RAM FINISHED LOADING KMERS TO RAM INPUT FILE PGA-nanoflit.fasta OUTPUT FILE test_output2/profiles/15mers THREADS 100 BIN WIDTH 10 BINS IN HIST 32 Loaded Reads 6131871 COMPLETED : Output at - test_output2/profiles/15mers 2023-02-22 09:54:35,125 - INFO - Generating 15-mer profiles complete 2023-02-22 09:54:35,126 - INFO - Sampling Reads 2023-02-22 10:07:02,935 - INFO - Sampling reads complete 2023-02-22 10:07:02,936 - INFO - Binning sampled reads