Closed CorentinEscobar closed 4 months ago
Hi @CorentinEscobar,
I think you may be specifying a low value for -c
or chromosome length, or leaving on the default of 1000000.
Hybracter by default will subsample the FASTQ read set specified to c*subsample_depth
number of bases, where subsample_depth is 100 by default.
If you don't want any subsampling (but still want to keep quality QC steps) the best way is to increase --subsample_depth
to a very large number (e.g. -subsample_depth 100000).
Alternatively, if your input reads are QC'd already, you can use --skip_qc
to skip all QC steps.
George
Hi @gbouras13
Thank you for your help ! In fact I stayed on the default value of the chormosome length. If I change this value to get closer to the expected chromosome size, hybracter may take a little bit longer but will use more reads and the coverage will also be higher.
Thanks again !
Corentin
Hi @gbouras13,
I have a problem with the coverage info in output files. I have assembled many genomes sequenced with nanopore and I find coverage between 20 and 30 for all, even for those who have a lot of data at the sequencing output. I checked this hybracter output information by comparing it to my assembly process (which contains the same programs but which is not automated in a pipeline like hybracter) and I found different output informations. For example, for the chromosome of a strain, I have a mean coverage of 24 with hybracter and 81 with my process. Regarding the quantity of data I think that the value of 81 is true. In fact, I have similar coverage values for many strains while I do not have at all the same quantity of data for each of the strains at the sequencing output.
Do you know where the problem could come from ? Does hybracter sort data before or after assembly ? If so, is it possible to modify the code somewhere to remove this sort and keep all the data?
Thanks for your help
Corentin