issues
search
anuradhawick
/
MetaBCC-LR
Reference-free Binning of Metagenomics Long Reads using Coverage and Composition
https://doi.org/10.1093/bioinformatics/btaa441
MIT License
19
stars
0
forks
source link
Improvements self-issue
#10
Open
anuradhawick
opened
3 years ago
anuradhawick
commented
3 years ago
Computations
Add checkpoint between counting 15mers and generating coverage histogram profiles.
Combine the steps and put a check for the computed 15mer counts. Load or compute and save. Avoid saving+writing cost
Use c++
valarray
for assignment step (may be compiler will use SSE)
Add compiler optimization flags -O3 (make last assignment step faster)
Add evaluation step to the final assigned bins (there is a difference between large enough bins and classifications.txt file) Fix it!
Binner
Organize different embeddings into different classes
Provide composition-only and coverage-only options
Adjust re-sampling to suit different embedding strategies.
Maybe steal ideas from
LRBinner
as a pre-step for embedding. VAE+UMAP/SONG works extremely well!
High-dimensional noise filtering using LRBinner algorithm (should help with noise in ONT reads)
anuradhawick
commented
3 years ago
[x] Add checkpoint between counting 15mers and generating coverage histogram profiles.
[x] Use c++ valarray for assignment step (may be compiler will use SSE)\
[X] Add compiler optimization flags -O3 (make last assignment step faster)
Computations
valarray
for assignment step (may be compiler will use SSE)Binner