eric9n / Kun-peng

Kun-peng: an ultra-fast, low-memory footprint and accurate taxonomy classifier for all
MIT License
10 stars 3 forks source link

pre-built database? #24

Closed jianshu93 closed 4 weeks ago

jianshu93 commented 1 month ago

Hi Kun_peng team,

Is it possible to provide some pre-built database out there just as original kraken2 did? E.g., one for GTDB v214 for species level purposes and also one for all NCBI/RefSeq genomes? I can imagine the file size can be large e.g., several hundred GB but it can be much more convenient than downloading and building database ourselves.

Thanks,

Jianshu

eric9n commented 1 month ago

Thank you for your inquiry, Jianshu.

Firstly, I'd like to mention that Kun_peng supports the use of existing Kraken 2 databases. You can easily convert them using the command:

kun_peng hashshard --db ${Kraken_database}

Additionally, Kun_peng supports incremental downloading of data from NCBI, which significantly simplifies the process of building a database.

Regarding pre-built databases, we understand their convenience. However, a high-quality database with accurate classification requires extensive testing. At present, we haven't provided pre-built databases as we're still in the process of ensuring their quality and reliability.

If it's convenient for you, would you like to try the conversion process or the incremental download feature? If you encounter any issues or have any questions, please don't hesitate to reach out to us.

We're here to help if you need any further assistance.

jianshu93 commented 1 month ago

Hi @eric9n,

Can you please also publish kun_peng to crates.io so that I can rely on it as a library?

Thanks,

Jianshu

eric9n commented 1 month ago

Thank you for your suggestion and interest in Kun_peng, Jianshu. We appreciate your enthusiasm for the project. However, I want to clarify that Kun_peng is currently in its early stages of development. At this point, we're focusing on refining its core functionality as a standalone classification tool. While we do plan to consider publishing to crates.io in the future, we're not quite ready for that step yet. Our current priority is to ensure the stability and efficiency of Kun_peng's primary features. Regarding your interest in using Kun_peng as a library, we're curious to understand more about your use case. Could you share what kind of integration or functionality you're envisioning? This information could be valuable for our future development plans. As we continue to develop Kun_peng, we'll keep in mind the potential for broader use cases, including library integration. Your feedback and suggestions are very welcome and can help shape our development priorities.

jianshu93 commented 1 month ago

hi @eric9n,

I have no problems running after download here (https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz):

kun_peng hashshard --db ./k2_standard_20240605

but I have the error when running classify pipeline:

kun_peng classify --db ./k2_standard_20240605 --chunk-dir ./chunk --num-threads 64 --output-dir ./kun_peng_Min17_output_new ~/p-ktk3-0/min17_fyle/min17.noHost.sup1K_renamed.fa.gz

HashConfig { version: 0, partition: 20, hash_capacity: 1073741824, capacity: 20870239817, size: 14611910080, value_bits: 16, value_mask: 65535 } splitr start... splitr took: 88.143478674s annotate start... start load table... load table took: 2.745644048s start load table... load table took: 2.615644723s start load table... load table took: 2.476984493s start load table... load table took: 2.486780517s start load table... load table took: 2.519388157s start load table... load table took: 2.318298111s start load table... load table took: 2.792624219s start load table... load table took: 2.810624781s start load table... load table took: 2.595668138s start load table... load table took: 2.530675081s start load table... load table took: 2.933368903s start load table... load table took: 2.56798537s start load table... load table took: 2.600330857s start load table... load table took: 2.338663833s start load table... load table took: 2.44572106s start load table... load table took: 2.299956651s start load table... load table took: 2.514335358s start load table... load table took: 2.532531474s start load table... load table took: 2.60795855s start load table... load table took: 1.280793601s annotate took: 150.019535786s resolve start... Error: Os { code: 2, kind: NotFound, message: "No such file or directory" } Command exited with non-zero status 1 875.33user 134.08system 3:58.69elapsed 422%CPU (0avgtext+0avgdata 9144748maxresident)k 118738424inputs+171943816outputs (2682major+2666600minor)pagefaults 0swaps

What could be the problem?

jianshu93 commented 1 month ago

I will have a metagenomic analysis pipeline to include all key steps of metagenomic analysis, from reads classification to assembly, binning and genome classification et.al., reads mapping, abundance calculation, all in Rust (I would be happy to invite the kun_peng team as contribution authors)

Thanks,

Jianshu

eric9n commented 1 month ago

./kun_peng_Min17_output_new exists?

jianshu93 commented 1 month ago

I have to create it? I was under the impression that it will be created automatically. I think it is better to create output directory automatically if it does not exist. But I have the following error:

HashConfig { version: 0, partition: 20, hash_capacity: 1073741824, capacity: 20870239817, size: 14611910080, value_bits: 16, value_mask: 65535 } splitr start... splitr took: 90.53971192s annotate start... start load table... load table took: 2.00787891s start load table... load table took: 1.76185187s start load table... load table took: 1.700453592s start load table... load table took: 1.47119161s start load table... load table took: 1.692230332s start load table... load table took: 1.616076211s start load table... load table took: 1.721559697s start load table... load table took: 1.579699391s start load table... load table took: 1.602690229s start load table... load table took: 1.601963291s start load table... load table took: 1.514434118s start load table... load table took: 1.731951626s start load table... load table took: 1.677111075s start load table... load table took: 1.521514931s start load table... load table took: 1.770650867s start load table... load table took: 1.492142581s start load table... load table took: 1.029181501s start load table... load table took: 1.59177273s start load table... load table took: 1.475298824s start load table... load table took: 763.493905ms annotate took: 134.822407205s resolve start... thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" note: run with RUST_BACKTRACE=1 environment variable to display a backtrace thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)" thread '' panicked at /storage/scratch1/4/jzhao399/Kun-peng/seqkmer/src/parallel.rs:212:44: Failed to send outputs: "SendError(..)"

Any idea?

jianshu93 commented 1 month ago

By the way the input is PacBio CCS long and accurate reads, not short reads.

Jianshu

eric9n commented 1 month ago

Can you show the contents of the ./chunk directory?

jianshu93 commented 1 month ago

This is the chunk:

4.8G -rw-r--r--. 1 jzhao399 pace-ktk3 4.8G Aug 17 22:53 sample_file_1_0.bin 4.8G -rw-r--r--. 1 jzhao399 pace-ktk3 4.8G Aug 17 22:53 sample_file_1_1.bin 4.8G -rw-r--r--. 1 jzhao399 pace-ktk3 4.8G Aug 17 22:53 sample_file_1_2.bin 4.8G -rw-r--r--. 1 jzhao399 pace-ktk3 4.8G Aug 17 22:53 sample_file_1_3.bin 4.0K -rw-r--r--. 1 jzhao399 pace-ktk3 87 Aug 17 22:49 sample_file.map 47M -rw-r--r--. 1 jzhao399 pace-ktk3 47M Aug 17 22:50 sample_id_1.map

Jianshu

eric9n commented 1 month ago

What operating system are you using?

jianshu93 commented 1 month ago

Linux, RHEL 9.3 Jianshu

eric9n commented 1 month ago

I was unable to address this earlier as I was out. I'm now looking into this issue. Currently, it appears to be caused by the concurrent processing of .bin files in the chunk directory. Could you check your output directory, ./kun_peng_Min17_outputnew, to see if there are files like output.txt? Or if such files exist, do they contain any data?

Additionally, can the example in the README document run successfully on your server?

cargo run --release --example build_and_classify --package kr2r

eric9n commented 1 month ago

I've reproduced your issue on my personal computer. If I introduce an exception in the code that writes to the output_txt file, I see the following errors:

Failed to send outputs: "SendError(..)"
212:44:
Failed to send outputs: "SendError(..)"

Could you please check the files in the output-dir directory to see if the data is being written correctly?

eric9n commented 1 month ago

I've updated the code to version 0.6.10. The changes include:

Added error messages when handling output writes to result files. Implemented automatic creation of the output-dir.

eric9n commented 1 month ago

I will have a metagenomic analysis pipeline to include all key steps of metagenomic analysis, from reads classification to assembly, binning and genome classification et.al., reads mapping, abundance calculation, all in Rust (I would be happy to invite the kun_peng team as contribution authors)

Great idea! I'm very interested in this metagenomic analysis pipeline. Could I participate in the development as an individual contributor? By the way, kun peng is also a collaborative project developed with friends, driven by our personal interests. Looking forward to potentially joining forces!

jianshu93 commented 1 month ago

Hi @eric9n,

yes there is output.txt but nothing in it.

Jianshu

jianshu93 commented 1 month ago

Yes, I would be happy that you contribute as independent author. I'have developed GSearch (https://doi.org/10.1093/nar/gkae609), completely in Rust, a larger-scale genome search/classification and comparison system for microbial genomes and that will serve as the genome analysis component but for reads based classification I think run_peng will be a great option.

I will come up with details after I have all the Rust packages well prepared.

Thanks,

Jianshu

jianshu93 commented 1 month ago

I have no problems running the cargo run command (without adding your recent commitment). I attached the output here.

Jianshu

output_kunpeng_cargo_run.txt

eric9n commented 1 month ago

Could you try the latest code? I've adjusted the error messages a bit.

Could you also provide more information? For example, details about the runtime environment configuration, the file format of the samples, and whether it's in standard FASTA format?

What is the content of the sample_id_1.map file in the chunk directory?

eric9n commented 1 month ago

I have no problems running the cargo run command (without adding your recent commitment). I attached the output here.

Jianshu

output_kunpeng_cargo_run.txt

This indicates that the program is running normally.

jianshu93 commented 1 month ago

Yes it is a standard fasta file for sure, I used it for many other testing purposes. I guess it has something to do with length of fasta files, some record can be 10,000 bp. I also tested a short reads fasta version and I got exact the same error. I will let you know soon what I have for the updated version.

Jianshu

eric9n commented 1 month ago

./target/release/kun_peng classify --db /Volumes/Jlab/std_db --chunk-dir /Volumes/Jlab/chunk --output-dir /Volumes/Jlab/chunk --batch-size 10 out_dir/library/archaea/refseq/GCF_000006805.1_ASM680v1_genomic.fna.gz HashConfig { version: 0, partition: 20, hash_capacity: 1073741824, capacity: 20870239817, size: 14611910080, value_bits: 16, value_mask: 65535 } splitr start... splitr took: 79.357375ms annotate start... start load table... load table took: 1.374151709s start load table... load table took: 1.359685625s start load table... load table took: 1.363973958s start load table... load table took: 1.360039375s start load table... load table took: 1.35983125s start load table... load table took: 1.3599785s start load table... load table took: 1.360632s start load table... load table took: 1.358771667s start load table... load table took: 1.359846792s start load table... load table took: 1.359714458s start load table... load table took: 1.359828375s start load table... load table took: 1.360269125s start load table... load table took: 1.360162375s start load table... load table took: 1.359785875s start load table... load table took: 1.360439042s start load table... load table took: 1.363612417s start load table... load table took: 1.359943625s start load table... load table took: 1.359522916s start load table... load table took: 1.359489791s start load table... load table took: 626.168042ms annotate took: 27.09910375s resolve start... resolve took: 92.082792ms Classify took: 27.336790917s

I used the standard library you provided and executed it on my personal computer, using FASTA files. output_6.txt

jianshu93 commented 1 month ago

can't find 1267867 in sample_id map file can't find 1307563 in sample_id map file can't find 1382003 in sample_id map file.

Many lines of such errors. Something to do with database (I use the shard)?

Thanks,

Jianshu

eric9n commented 1 month ago

What is the content of the sample_id_1.map file in the chunk directory?

eric9n commented 1 month ago

Since the program cleans up the chunk at the end, we can modify it to execute in steps: kun_peng splitr --db ${Database} --chunk-dir chunk ${sample_file} kun_peng annotate --db ${Database} --chunk-dir chunk At this point, check the sample_id file in the chunk directory.

eric9n commented 1 month ago

I suspect that this issue is caused by special characters in the identifiers of the FASTA file. I have modified this part of the logic in the code to accommodate special characters.

jianshu93 commented 1 month ago

I would suggest use needletail for fasta/fastq file processing, which is a standard and highly optimized library for fasta/fastq files, taking into account all possible problem (different characters et.al.). My fasta file has header + space + annotation. But I will test again.

Thanks,

Jianshu

eric9n commented 1 month ago

I am aware of the needletail and seq_io libraries and have considered them. However, based on preliminary tests on my personal computer, I don't believe they would be faster than my own seqkmer for this specific processing task, as they need to handle more compatibility issues.

jianshu93 commented 1 month ago

I still have the bug when running on the large file. I use this small file (first 500 lines of the large file) and it succeeded (attached):

kun_peng classify --db ~/scratch/k2_standard_20240605 --chunk-dir ./chunk --num-threads 64 --output-dir ./kun_peng_Min17_output_new ./test.fasta

but the output is empty (no information in output_1.txt and kr2 report).

test.fasta.zip

eric9n commented 1 month ago

./target/release/kun_peng classify --db /Volumes/Jlab/std_db --chunk-dir /Volumes/Jlab/chunk --output-dir /Volumes/Jlab/chunk --batch-size 10 ~/Downloads/test.fasta HashConfig { version: 0, partition: 20, hash_capacity: 1073741824, capacity: 20870239817, size: 14611910080, value_bits: 16, value_mask: 65535 } splitr start... splitr took: 4.203209ms annotate start... start load table... load table took: 1.385644875s start load table... load table took: 1.362142459s start load table... load table took: 1.432869916s start load table... load table took: 1.361609709s start load table... load table took: 1.358973s start load table... load table took: 1.35971425s start load table... load table took: 1.360143917s start load table... load table took: 1.35969175s start load table... load table took: 1.361218209s start load table... load table took: 1.360476709s start load table... load table took: 1.362514292s start load table... load table took: 1.359886083s start load table... load table took: 1.360171333s start load table... load table took: 1.359247125s start load table... load table took: 1.360804s start load table... load table took: 1.360280125s start load table... load table took: 1.360002583s start load table... load table took: 1.360471542s start load table... load table took: 1.360040791s start load table... load table took: 624.670917ms annotate took: 27.1632235s resolve start... resolve took: 13.062375ms Classify took: 27.239814583s output_10.txt

What is the content of the sample_id_1.map file in the chunk directory?

jianshu93 commented 1 month ago

[jzhao399@atl1-1-02-002-23-1 release]$ cat chunk/sample_file.map 1 ./test.fasta.

I do not have sample_id_1.map, one one file? So strange. Must be the database problem. Again I ran the hashshard on the pre-built kraken2 2024 pre-built database as mentioned above.

Jianshu

eric9n commented 1 month ago

If you don't specify the output-dir parameter, what happens?

./target/release/kun_peng classify --db /Volumes/Jlab/std_db --chunk-dir /Volumes/Jlab/chunk --batch-size 10  ~/Downloads/test.fasta
HashConfig { version: 0, partition: 20, hash_capacity: 1073741824, capacity: 20870239817, size: 14611910080, value_bits: 16, value_mask: 65535 }
splitr start...
splitr took: 3.844833ms
annotate start...
start load table...
load table took: 1.374144125s
start load table...
load table took: 1.360161375s
start load table...
load table took: 1.3793125s
start load table...
load table took: 1.360883833s
start load table...
load table took: 1.360559208s
start load table...
load table took: 1.360619083s
start load table...
load table took: 1.363896542s
start load table...
load table took: 1.360500208s
start load table...
load table took: 1.361105542s
start load table...
load table took: 1.360789333s
start load table...
load table took: 1.360286541s
start load table...
load table took: 1.3605535s
start load table...
load table took: 1.360686333s
start load table...
load table took: 1.359728417s
start load table...
load table took: 1.359886042s
start load table...
load table took: 1.360117541s
start load table...
load table took: 1.360238542s
start load table...
load table took: 1.360368666s
start load table...
load table took: 1.361858708s
start load table...
load table took: 622.997875ms
annotate took: 27.0131855s
resolve start...
C   S138_R1-1193    1548548 150 0:11 2:3 1548548:3 2:1 0:1 2:7 0:6 2785627:1 0:4
C   S138_R1-1280    3023518 150 1224:2 2:1 1224:1 3023518:1 2:1 28211:1 3023518:13 0:2 3023518:8
C   S138_R1-1159    83889   149 0:2 83889:1 0:1 83889:1 0:31
C   S138_R1-1109    358681  151 0:25 358681:1
C   S138_R1-1120    122358  146 0:1 122358:1 0:32
C   S138_R1-1217    2918528 151 0:11 2901189:1 0:10 2918528:2 2901189:3 0:3 1224:1 0:4
C   S138_R1-1147    517719  151 517719:2 0:1 517719:1 0:7 2768066:1 0:12 204455:1 28211:1 0:1 204455:1 1224:2 0:1 1224:3 204455:1 0:3
C   S138_R1-1106    469 146 469:31
C   S138_R1-1215    78543   150 0:14 78543:1 0:21
C   S138_R1-1192    173860  150 0:6 335541:1 0:1 1760:1 173860:1 0:1 2:1 1760:1 2:1 131567:3 2:1 0:2 1917485:1 0:18
C   S138_R1-1323    2614442 138 0:9 2614442:1 0:17
C   S138_R1-1068    2   151 0:3 2903816:1 0:26 754502:1 0:8
C   S138_R1-1156    196795  150 60136:1 196795:3 60136:2 196795:1 60136:1 196795:10 60136:1 196795:1 60136:1 196795:1 60136:1 196795:6
C   S138_R1-1220    765912  151 0:8 765912:1 0:12
C   S138_R1-1087    1313172 150 0:1 1313172:1 0:26
C   S138_R1-1171    32033   151 0:11 2763107:1 0:7 1224:2 32033:1 2:2 135614:1 2:1
C   S138_R1-1122    2953890 126 0:3 2953890:1 0:16
C   S138_R1-1112    469 151 469:32
C   S138_R1-1175    1760    150 0:13 1760:1 0:4 455432:1 2996826:1 0:2
C   S138_R1-1040    135616  150 135616:1 0:1 135616:4 0:2 2:2 0:1 544448:1 1224:1 0:9 2:1 0:3 2:2
C   S138_R1-1303    108980  149 469:4 2:1 469:1 2:1 108980:2 469:4 108980:2
C   S138_R1-1052    2884263 151 2:20 2995304:3 2:1 3031711:1 2:4 0:1 2:1 201174:1 2:1 0:1 676965:1 0:3 2618799:1 2884263:4
C   S138_R1-1063    1236    151 0:10 1236:1 1224:1 0:24
C   S138_R1-1317    1224    149 2:1 1224:2 0:1
C   S138_R1-1273    2487072 149 0:8 2487072:1 0:3
C   S138_R1-1154    1179672 151 0:1 1179672:1 0:26
C   S138_R1-1031    2268026 150 0:17 2268026:1 0:3 2268026:1 0:21
C   S138_R1-1274    314275  150 314275:22 226:1 314275:3
resolve took: 3.632875ms
Classify took: 27.07933425s
eric9n commented 1 month ago

[jzhao399@atl1-1-02-002-23-1 release]$ cat chunk/sample_file.map 1 ./test.fasta.

I do not have sample_id_1.map, one one file? So strange. Must be the database problem. Again I ran the hashshard on the pre-built kraken2 2024 pre-built database as mentioned above.

Jianshu

Since the program cleans up the chunk at the end, we can modify it to execute in steps: kun_peng splitr --db ${Database} --chunk-dir chunk ${sample_file} kun_peng annotate --db ${Database} --chunk-dir chunk At this point, check the sample_id file in the chunk directory.

eric9n commented 1 month ago

This issue is unrelated to the database. I'm using the database you provided, and I've already seen the classification results. The only remaining step is to convert the results into a human-readable text format. The problem might be occurring during this conversion process. However, when I run the sample and database you provided, the results are as shown above.

jianshu93 commented 1 month ago

It seems I have no problems running those 2 steps:

[jzhao399@atl1-1-02-002-23-1 release]$ cat chunk_new/sample_id_1.map 1 S138_R1-1021/1 150 0 2 S138_R1-1023/1 151 17 3 S138_R1-1024/1 150 28 4 S138_R1-1027/1 151 12 5 S138_R1-1028/1 151 34 6 S138_R1-1029/1 151 39 7 S138_R1-1030/1 151 5 8 S138_R1-1031/1 150 43 9 S138_R1-1032/1 130 7 10 S138_R1-1033/1 151 42 11 S138_R1-1034/1 151 37 12 S138_R1-1035/1 151 27 13 S138_R1-1036/1 150 37 14 S138_R1-1037/1 151 40 15 S138_R1-1038/1 149 7 16 S138_R1-1039/1 150 41 17 S138_R1-1040/1 150 28 18 S138_R1-1041/1 151 21 19 S138_R1-1042/1 115 0 20 S138_R1-1043/1 151 35 21 S138_R1-1044/1 151 44 22 S138_R1-1045/1 122 30 23 S138_R1-1047/1 151 42 24 S138_R1-1048/1 149 37 25 S138_R1-1049/1 88 4 26 S138_R1-1051/1 151 6 27 S138_R1-1052/1 151 43 28 S138_R1-1054/1 149 38 29 S138_R1-1055/1 87 4 30 S138_R1-1056/1 74 14 31 S138_R1-1057/1 150 37 32 S138_R1-1058/1 151 30 33 S138_R1-1059/1 80 0 34 S138_R1-1061/1 150 18 35 S138_R1-1062/1 149 28 36 S138_R1-1063/1 151 36 37 S138_R1-1064/1 151 24 38 S138_R1-1065/1 151 18 39 S138_R1-1066/1 151 40 40 S138_R1-1068/1 151 39 41 S138_R1-1070/1 88 7 42 S138_R1-1071/1 149 24 43 S138_R1-1073/1 151 23 44 S138_R1-1074/1 88 19 45 S138_R1-1075/1 151 16 46 S138_R1-1076/1 150 39 47 S138_R1-1077/1 150 3 48 S138_R1-1078/1 83 18 49 S138_R1-1079/1 150 0 50 S138_R1-1080/1 150 39 51 S138_R1-1081/1 151 27 52 S138_R1-1083/1 67 1 53 S138_R1-1086/1 150 22 54 S138_R1-1087/1 150 28 55 S138_R1-1088/1 147 13 56 S138_R1-1089/1 150 42 57 S138_R1-1090/1 82 8 58 S138_R1-1091/1 149 1 59 S138_R1-1092/1 151 27 60 S138_R1-1093/1 79 16 61 S138_R1-1094/1 72 0 62 S138_R1-1095/1 151 27 63 S138_R1-1096/1 151 25 64 S138_R1-1097/1 149 14 65 S138_R1-1098/1 151 19 66 S138_R1-1099/1 150 24 67 S138_R1-1100/1 151 40 68 S138_R1-1101/1 99 10 69 S138_R1-1102/1 151 25 70 S138_R1-1103/1 91 7 71 S138_R1-1106/1 146 31 72 S138_R1-1108/1 147 42 73 S138_R1-1109/1 151 26 74 S138_R1-1110/1 98 10 75 S138_R1-1112/1 151 32 76 S138_R1-1113/1 151 20 77 S138_R1-1114/1 151 41 78 S138_R1-1115/1 151 42 79 S138_R1-1116/1 149 20 80 S138_R1-1117/1 150 19 81 S138_R1-1118/1 151 9 82 S138_R1-1119/1 151 33 83 S138_R1-1120/1 146 34 84 S138_R1-1121/1 151 9 85 S138_R1-1122/1 126 20 86 S138_R1-1124/1 89 8 87 S138_R1-1126/1 150 34 88 S138_R1-1127/1 151 30 89 S138_R1-1128/1 151 7 90 S138_R1-1129/1 151 38 241 S138_R1-1326/1 151 9 242 S138_R1-1327/1 149 7 243 S138_R1-1328/1 62 2 244 S138_R1-1331/1 151 27 245 S138_R1-1332/1 151 9 246 S138_R1-1333/1 62 1 247 S138_R1-1334/1 151 4 248 S138_R1-1335/1 149 9 249 S138_R1-1336/1 150 25 250 S138_R1-1337/1 151 11 91 S138_R1-1131/1 151 31 92 S138_R1-1134/1 50 0 93 S138_R1-1135/1 123 3 94 S138_R1-1136/1 148 10 95 S138_R1-1137/1 151 18 96 S138_R1-1138/1 149 31 97 S138_R1-1139/1 149 38 98 S138_R1-1141/1 121 0 99 S138_R1-1142/1 151 22 100 S138_R1-1144/1 150 40 101 S138_R1-1145/1 151 16 102 S138_R1-1146/1 150 27 103 S138_R1-1147/1 151 38 104 S138_R1-1148/1 150 39 105 S138_R1-1150/1 151 19 106 S138_R1-1151/1 151 19 107 S138_R1-1153/1 151 37 108 S138_R1-1154/1 151 28 109 S138_R1-1155/1 151 5 110 S138_R1-1156/1 150 29 111 S138_R1-1157/1 81 15 112 S138_R1-1158/1 150 28 113 S138_R1-1159/1 149 36 114 S138_R1-1160/1 151 30 115 S138_R1-1161/1 151 5 116 S138_R1-1162/1 140 26 117 S138_R1-1163/1 150 23 118 S138_R1-1164/1 149 21 119 S138_R1-1166/1 59 0 120 S138_R1-1167/1 151 36 181 S138_R1-1239/1 142 25 182 S138_R1-1240/1 151 0 183 S138_R1-1241/1 150 18 184 S138_R1-1242/1 91 0 185 S138_R1-1244/1 150 12 186 S138_R1-1245/1 149 41 187 S138_R1-1246/1 151 25 188 S138_R1-1247/1 65 9 189 S138_R1-1249/1 151 16 190 S138_R1-1250/1 129 20 191 S138_R1-1251/1 151 0 192 S138_R1-1253/1 151 28 193 S138_R1-1254/1 151 25 194 S138_R1-1255/1 81 0 195 S138_R1-1256/1 151 0 196 S138_R1-1257/1 151 39 197 S138_R1-1259/1 150 32 198 S138_R1-1261/1 65 0 199 S138_R1-1262/1 141 6 200 S138_R1-1263/1 151 20 201 S138_R1-1265/1 54 3 202 S138_R1-1266/1 151 2 203 S138_R1-1268/1 88 6 204 S138_R1-1269/1 89 0 205 S138_R1-1270/1 151 5 206 S138_R1-1271/1 151 39 207 S138_R1-1272/1 151 17 208 S138_R1-1273/1 149 12 209 S138_R1-1274/1 150 26 210 S138_R1-1275/1 150 16 151 S138_R1-1202/1 150 19 152 S138_R1-1203/1 151 19 153 S138_R1-1205/1 150 41 154 S138_R1-1208/1 151 32 155 S138_R1-1209/1 151 0 156 S138_R1-1210/1 149 10 157 S138_R1-1211/1 90 3 158 S138_R1-1212/1 150 29 159 S138_R1-1213/1 149 19 160 S138_R1-1214/1 140 39 161 S138_R1-1215/1 150 36 162 S138_R1-1216/1 149 10 163 S138_R1-1217/1 151 35 164 S138_R1-1220/1 151 21 165 S138_R1-1221/1 151 11 166 S138_R1-1222/1 151 25 167 S138_R1-1223/1 151 29 168 S138_R1-1225/1 86 1 169 S138_R1-1226/1 151 20 170 S138_R1-1227/1 150 0 171 S138_R1-1228/1 149 28 172 S138_R1-1229/1 149 0 173 S138_R1-1230/1 151 21 174 S138_R1-1231/1 150 39 175 S138_R1-1232/1 89 0 176 S138_R1-1233/1 150 1 177 S138_R1-1234/1 61 0 178 S138_R1-1236/1 151 39 179 S138_R1-1237/1 150 26 180 S138_R1-1238/1 151 39 121 S138_R1-1168/1 151 4 122 S138_R1-1169/1 151 39 123 S138_R1-1170/1 134 34 124 S138_R1-1171/1 151 26 125 S138_R1-1172/1 151 21 126 S138_R1-1174/1 149 23 127 S138_R1-1175/1 150 22 128 S138_R1-1176/1 151 22 129 S138_R1-1177/1 151 21 130 S138_R1-1178/1 150 29 131 S138_R1-1179/1 133 23 132 S138_R1-1180/1 151 37 133 S138_R1-1181/1 151 23 134 S138_R1-1182/1 128 32 135 S138_R1-1183/1 150 36 136 S138_R1-1184/1 127 26 137 S138_R1-1185/1 150 30 138 S138_R1-1187/1 151 41 139 S138_R1-1188/1 151 7 140 S138_R1-1190/1 149 12 141 S138_R1-1192/1 150 39 142 S138_R1-1193/1 150 37 143 S138_R1-1194/1 151 2 144 S138_R1-1195/1 109 0 145 S138_R1-1196/1 150 10 146 S138_R1-1197/1 53 8 147 S138_R1-1198/1 149 23 148 S138_R1-1199/1 151 17 149 S138_R1-1200/1 151 27 150 S138_R1-1201/1 84 5 211 S138_R1-1276/1 151 32 212 S138_R1-1278/1 52 0 213 S138_R1-1280/1 150 30 214 S138_R1-1281/1 150 26 215 S138_R1-1282/1 151 25 216 S138_R1-1284/1 149 3 217 S138_R1-1287/1 149 27 218 S138_R1-1289/1 151 9 219 S138_R1-1290/1 124 3 220 S138_R1-1291/1 151 6 221 S138_R1-1292/1 119 28 222 S138_R1-1293/1 147 18 223 S138_R1-1294/1 92 3 224 S138_R1-1295/1 90 4 225 S138_R1-1296/1 151 11 226 S138_R1-1301/1 83 0 227 S138_R1-1303/1 149 15 228 S138_R1-1304/1 62 2 229 S138_R1-1305/1 151 27 230 S138_R1-1306/1 134 3 231 S138_R1-1307/1 151 6 232 S138_R1-1308/1 150 19 233 S138_R1-1313/1 151 31 234 S138_R1-1315/1 151 12 235 S138_R1-1316/1 150 19 236 S138_R1-1317/1 149 4 237 S138_R1-1318/1 149 11 238 S138_R1-1320/1 150 12 239 S138_R1-1321/1 151 20 240 S138_R1-1323/1 138 27

Jianshu

jianshu93 commented 1 month ago

Then:

./kun_peng classify --db ~/scratch/k2_standard_20240605 --chunk-dir ./chunk_new --batch-size 10 ./test.fasta Error: Custom { kind: Other, error: "The directory './chunk_new' must not contain files with extensions '.k2', '.map', or '.bin' for 'sample' and 'sample_id'" }

Jianshu

eric9n commented 1 month ago

This is normal. Also, check if the chunk_new/sampleid*.bin files contain any data. Then execute the following command:

kun_peng resolve --db ~/scratch/k2_standard_20240605 --chunk-dir chunk_new
eric9n commented 1 month ago

Then:

./kun_peng classify --db ~/scratch/k2_standard_20240605 --chunk-dir ./chunk_new --batch-size 10 ./test.fasta Error: Custom { kind: Other, error: "The directory './chunk_new' must not contain files with extensions '.k2', '.map', or '.bin' for 'sample' and 'sample_id'" }

Jianshu

After executing the above two steps, do not run the classify command again.

jianshu93 commented 1 month ago

ls -lhs chunk_new/*.bin total 28K 4.0K -rw-r--r--. 1 jzhao399 pace-ktk3 252 Aug 19 09:28 sample_file_1_0.bin 4.0K -rw-r--r--. 1 jzhao399 pace-ktk3 984 Aug 19 09:28 sample_file_1_1.bin 4.0K -rw-r--r--. 1 jzhao399 pace-ktk3 540 Aug 19 09:28 sample_file_1_2.bin 4.0K -rw-r--r--. 1 jzhao399 pace-ktk3 1.7K Aug 19 09:28 sample_file_1_3.bin

then: ./kun_peng resolve --db ~/scratch/k2_standard_20240605 --chunk-dir chunk_new resolve start... can't find 40 in sample_id map file can't find 164 in sample_id map file can't find 108 in sample_id map file can't find 208 in sample_id map file can't find 124 in sample_id map file can't find 8 in sample_id map file can't find 236 in sample_id map file can't find 240 in sample_id map file can't find 36 in sample_id map file can't find 142 in sample_id map file can't find 110 in sample_id map file can't find 54 in sample_id map file can't find 103 in sample_id map file can't find 227 in sample_id map file can't find 27 in sample_id map file can't find 127 in sample_id map file can't find 83 in sample_id map file can't find 75 in sample_id map file can't find 163 in sample_id map file can't find 71 in sample_id map file can't find 85 in sample_id map file can't find 161 in sample_id map file can't find 73 in sample_id map file can't find 113 in sample_id map file can't find 213 in sample_id map file can't find 17 in sample_id map file can't find 209 in sample_id map file can't find 141 in sample_id map file resolve took: 6.381284ms

Git log commit 434c0bd72f0ea7a0b35efbb43baae2f022dfb46e (HEAD -> main, origin/main, origin/HEAD) Author: dagou eric9n@gmail.com Date: Mon Aug 19 14:51:03 2024 +0800

bug fix

commit e978253e476e0dd57cf594d0a68abd6316f1dfec Author: dagou eric9n@gmail.com Date: Mon Aug 19 11:00:58 2024 +0800

bug fix
jianshu93 commented 1 month ago

The problem of linux? what system you are testing on.

eric9n commented 1 month ago
[jzhao399@atl1-1-02-002-23-1 release]$ cat chunk_new/sample_id_1.map
1 S138_R1-1021/1
150 0
2 S138_R1-1023/1
151 17
3 S138_R1-1024/1
150 28

Is the line break after the id generated by GitHub comments, or does your map file itself contain line breaks?

eric9n commented 1 month ago
cat /Volumes/Jlab/chunk/sample_id_13.map
31  S138_R1-1057/1  150 37
32  S138_R1-1058/1  151 30
33  S138_R1-1059/1  80  0
34  S138_R1-1061/1  150 18
35  S138_R1-1062/1  149 28
36  S138_R1-1063/1  151 36
37  S138_R1-1064/1  151 24
38  S138_R1-1065/1  151 18
39  S138_R1-1066/1  151 40
jianshu93 commented 1 month ago

It was my problems! now It works:

[jzhao399@atl1-1-02-002-23-1 release]$ ./kun_peng resolve --db ~/scratch/k2_standard_20240605 --chunk-dir chunk_new resolve start... C S138_R1-1068 2 151 0:3 2903816:1 0:26 754502:1 0:8 C S138_R1-1273 2487072 149 0:8 2487072:1 0:3 C S138_R1-1323 2614442 138 0:9 2614442:1 0:17 C S138_R1-1220 765912 151 0:8 765912:1 0:12 C S138_R1-1154 1179672 151 0:1 1179672:1 0:26 C S138_R1-1171 32033 151 0:11 2763107:1 0:7 1224:2 32033:1 2:2 135614:1 2:1 C S138_R1-1031 2268026 150 0:17 2268026:1 0:3 2268026:1 0:21 C S138_R1-1317 1224 149 2:1 1224:2 0:1 C S138_R1-1063 1236 151 0:10 1236:1 1224:1 0:24 C S138_R1-1087 1313172 150 0:1 1313172:1 0:26 C S138_R1-1193 1548548 150 0:11 2:3 1548548:3 2:1 0:1 2:7 0:6 2785627:1 0:4 C S138_R1-1156 196795 150 60136:1 196795:3 60136:2 196795:1 60136:1 196795:10 60136:1 196795:1 60136:1 196795:1 60136:1 196795:6 C S138_R1-1120 122358 146 0:1 122358:1 0:32 C S138_R1-1106 469 146 469:31 C S138_R1-1175 1760 150 0:13 1760:1 0:4 455432:1 2996826:1 0:2 C S138_R1-1217 2918528 151 0:11 2901189:1 0:10 2918528:2 2901189:3 0:3 1224:1 0:4 C S138_R1-1303 108980 149 469:4 2:1 469:1 2:1 108980:2 469:4 108980:2 C S138_R1-1112 469 151 469:32 C S138_R1-1147 517719 151 517719:2 0:1 517719:1 0:7 2768066:1 0:12 204455:1 28211:1 0:1 204455:1 1224:2 0:1 1224:3 204455:1 0:3 C S138_R1-1052 2884263 151 2:20 2995304:3 2:1 3031711:1 2:4 0:1 2:1 201174:1 2:1 0:1 676965:1 0:3 2618799:1 2884263:4 C S138_R1-1274 314275 150 314275:22 226:1 314275:3 C S138_R1-1040 135616 150 135616:1 0:1 135616:4 0:2 2:2 0:1 544448:1 1224:1 0:9 2:1 0:3 2:2 C S138_R1-1159 83889 149 0:2 83889:1 0:1 83889:1 0:31 C S138_R1-1122 2953890 126 0:3 2953890:1 0:16 C S138_R1-1109 358681 151 0:25 358681:1 C S138_R1-1215 78543 150 0:14 78543:1 0:21 C S138_R1-1192 173860 150 0:6 335541:1 0:1 1760:1 173860:1 0:1 2:1 1760:1 2:1 131567:3 2:1 0:2 1917485:1 0:18 C S138_R1-1280 3023518 150 1224:2 2:1 1224:1 3023518:1 2:1 28211:1 3023518:13 0:2 3023518:8 resolve took: 7.122967ms

eric9n commented 1 month ago

You may test the classify or other commands. If there are any issues, please let me know.

jianshu93 commented 1 month ago

./kun_peng classify --db ~/scratch/k2_standard_20240605 --chunk-dir ./chunk_new2 --batch-size 10 --output-dir ./kun_peng_Min17_out ./test.fasta

It worked for the testing file:

[jzhao399@atl1-1-02-002-23-1 release]$ cat ./kun_peng_Min17_out/output_2.kreport2 88.80 222 222 U 0 unclassified 11.20 28 0 R 1 root 11.20 28 0 R1 131567 cellular organisms 11.20 28 1 D 2 Bacteria 6.80 17 1 P 1224 Pseudomonadota 4.80 12 1 C 1236 Gammaproteobacteria 1.60 4 0 O 72274 Pseudomonadales 1.20 3 0 F 135621 Pseudomonadaceae 0.80 2 0 G 286 Pseudomonas 0.40 1 0 G1 196821 unclassified Pseudomonas 0.40 1 1 S 2614442 Pseudomonas sp. LPB0260 0.40 1 0 G1 136843 Pseudomonas fluorescens group 0.40 1 1 S 78543 Pseudomonas migulae 0.40 1 0 G 2901189 Halopseudomonas 0.40 1 1 S 2918528 Halopseudomonas maritima 0.40 1 0 F 2887365 Marinobacteraceae 0.40 1 0 G 2742 Marinobacter 0.40 1 1 G1 83889 unclassified Marinobacter 1.20 3 0 O 2887326 Moraxellales 1.20 3 0 F 468 Moraxellaceae 1.20 3 2 G 469 Acinetobacter 0.40 1 1 S 108980 Acinetobacter ursingii 0.40 1 0 O 135622 Alteromonadales 0.40 1 0 F 72275 Alteromonadaceae 0.40 1 0 F1 2903219 Alteromonas/Salinimonas group 0.40 1 0 G 226 Alteromonas 0.40 1 1 S 314275 Alteromonas mediterranea 0.40 1 0 O 135614 Lysobacterales 0.40 1 1 F 32033 Lysobacteraceae 0.40 1 0 O 135613 Chromatiales 0.40 1 0 F 1046 Chromatiaceae 0.40 1 0 G 156885 Thioflavicoccus 0.40 1 0 S 80679 Thioflavicoccus mobilis 0.40 1 1 S1 765912 Thioflavicoccus mobilis 8321 0.40 1 0 O 72273 Thiotrichales 0.40 1 1 F 135616 Piscirickettsiaceae 1.20 3 0 C 28211 Alphaproteobacteria 0.80 2 0 O 204455 Rhodobacterales 0.40 1 0 F 2854170 Roseobacteraceae 0.40 1 0 G 60136 Sulfitobacter 0.40 1 1 G1 196795 unclassified Sulfitobacter 0.40 1 0 F 31989 Paracoccaceae 0.40 1 0 G 1679449 Pseudooceanicola 0.40 1 1 S 517719 Pseudooceanicola nitratireducens 0.40 1 0 O 204457 Sphingomonadales 0.40 1 0 F 335929 Erythrobacteraceae 0.40 1 0 G 1855416 Qipengyuania 0.40 1 0 G1 2645039 unclassified Qipengyuania 0.40 1 1 S 3023518 Qipengyuania sp. GPGPB31 0.40 1 0 C 28216 Betaproteobacteria 0.40 1 0 O 206351 Neisseriales 0.40 1 0 F 1499392 Chromobacteriaceae 0.40 1 0 G 535 Chromobacterium 0.40 1 0 G1 2641838 unclassified Chromobacterium 0.40 1 1 S 2953890 Chromobacterium sp. IIBBL 290-4 2.40 6 0 D1 1783272 Terrabacteria group 2.00 5 0 P 201174 Actinomycetota 1.20 3 1 C 1760 Actinomycetes 0.40 1 0 O 85011 Kitasatosporales 0.40 1 0 F 2062 Streptomycetaceae 0.40 1 0 G 1883 Streptomyces 0.40 1 1 S 173860 Streptomyces rectiverticillatus 0.40 1 0 O 85008 Micromonosporales 0.40 1 0 F 28056 Micromonosporaceae 0.40 1 0 G 1865 Actinoplanes 0.40 1 1 S 122358 Actinoplanes ianthinogenes 0.80 2 0 C 84992 Acidimicrobiia 0.80 2 0 O 84993 Acidimicrobiales 0.40 1 0 F 2448023 Ilumatobacteraceae 0.40 1 0 G 682522 Ilumatobacter 0.40 1 0 S 467094 Ilumatobacter coccineus 0.40 1 1 S1 1313172 Ilumatobacter coccineus YM16-304 0.40 1 0 F 633392 Iamiaceae 0.40 1 0 G 3075137 Dermatobacter 0.40 1 1 S 2884263 Dermatobacter hominis 0.40 1 0 P 1239 Bacillota 0.40 1 0 C 91061 Bacilli 0.40 1 0 O 1385 Bacillales 0.40 1 0 F 186822 Paenibacillaceae 0.40 1 0 G 55080 Brevibacillus 0.40 1 0 S 1393 Brevibacillus brevis 0.40 1 1 S1 358681 Brevibacillus brevis NBRC 100599 1.20 3 0 D1 1783270 FCB group 1.20 3 0 D2 68336 Bacteroidota/Chlorobiota group 1.20 3 0 P 976 Bacteroidota 0.80 2 0 C 117743 Flavobacteriia 0.80 2 0 O 200644 Flavobacteriales 0.40 1 0 F 2762318 Weeksellaceae 0.40 1 0 F1 2782232 Chryseobacterium group 0.40 1 0 G 2782229 Epilithonimonas 0.40 1 1 S 2487072 Epilithonimonas vandammei 0.40 1 0 F 49546 Flavobacteriaceae 0.40 1 0 G 237 Flavobacterium 0.40 1 0 G1 196869 unclassified Flavobacterium 0.40 1 1 S 1179672 Flavobacterium sp. KBS0721 0.40 1 0 C 768503 Cytophagia 0.40 1 0 O 768507 Cytophagales 0.40 1 0 F 2896860 Spirosomataceae 0.40 1 0 G 105 Runella 0.40 1 0 G1 2631759 unclassified Runella 0.40 1 1 S 2268026 Runella sp. SP2 0.40 1 0 D1 68525 delta/epsilon subdivisions 0.40 1 0 C 28221 Deltaproteobacteria 0.40 1 0 O 1779134 Bradymonadales 0.40 1 0 F 1779135 Bradymonadaceae 0.40 1 0 G 1779136 Bradymonas 0.40 1 1 S 1548548 Bradymonas sediminis