Closed alxsimon closed 4 years ago
Hello, Thanks for reaching out.
FASTA is supported. I will update the README to reflect this. FASTQ restriction no longer applies as we have implemented reads filtering using biopython. I hope this helps.
Let us know how it goes. We are working on an improved version of this tool and your input will be highly appreciated.
Implemented and README was updated.
Thank you!
I tried it and the tool only found 1 bin while I am sure there are extensive bacterial contamination (~20% of the assembly) in this molluscan genome. What are the parameters that will most influence the binning?
Here is what I tried:
python MetaBCC-LR/MetaBCC-LR --reads-path edu_v5.split.fa --threads 32 --max-memory 20000 --output ./edu_bins --sample-count 100 --sensitivity 10
Additionally, the folder images/
is empty, but no error message was displayed, is this usual?
We have currently developed to support reads binning using PacBio and ONT reads. We have only tested 100,000 to 1,000,000 reads and did not check with smaller datasets. I don't think our approach will perform with assemblies. Usually, the sample count needs to be at least 5000 reads to detect the bins.
We cannot use contigs because we count the k-mers of all reads to estimate the coverage to support binning. I hope this clears the doubt. Could you try using the raw set of reads if they are longer than 1000 bp each?
Unfortunately, this is 10X chromium data, so short reads.
Thanks anyway for your answers. Best regards
Hi, I would like to try your pipeline for the classification of fasta sequences (this is in fact a genome assembly where I want to remove contamination).
Are the reads quality scores used for something in the pipeline?
If not, would it be possible to implement fasta support?
As a first approach I may try to create a dummy fastq (but in the end this would be a waste of time and resources if quality scores are not used). Thanks