Closed dcopetti closed 3 years ago
Regarding to the first question, see the hifiasm command-line help:
Usage: hifiasm [options] <in_1.fq> <in_2.fq> <...>
Just put multiple files on the command line. For typical HiFi data, no need to use -z
. Like seqtk, minimap2, bwa, ... hifiasm seamlessly works with fasta, fastq and their gzip'd versions. I don't see hifiasm support BAM as that requires to bring a heavy dependency and would make hifiasm harder to install. File conversion is much faster than assembly anyway.
As to the setting, -D10
sometimes helps. You can try both the default and -D10
. As is explained in README, -l0
is preferred for inbred samples. Note that most of time haplotig purging shouldn't purge homologous regions. However, it may introduce minor errors in corner cases.
Hi, I have in_1.fq in_2.fq in3.fq Can I use hifiasm [options] in*.fq ?
Yes, you can.
Does it make a difference whether I merge several .fastq.gz input files first with cat and feed one large input file to hifiasm, or whether I hand it several small fastq.gz files as described here? Does it influence speed/performance?
It won't. The same inputs will have exactly the same output.
Hello, I am about to start an assembly of a large genome and I have my input data (fasta/fastq straight off of the instrument) in 13 files of 30-50 GB each (fastq). I wonder if it is possible to specify more than an input file in the
hifiasm
command, or if I can supply a list (.fofn) of the inputs - this would save time in preparing the input and in moving it around as well. Does the format (fasta/fastq) make any difference? Maybe in the future, would it be possible to feed directly a bam file - to save time in converting between formats? Do I need to do any pre-processing of such data? In which case would I need to use the-z
option of hifiasm?Lastly, a question regarding settings. I have an inbred diploid plant genome of about 10 Gb, with about 24x coverage of CCS data. I see that for maize you use
-l0
and for strawberry (because of polyploidy?)-D10
. My plant is allohexaploid, should I also increase-D
? Should I also avoid purging "haplotigs" since these could be homoeologous sequences? Any other option I should consider? I guess I will run a few assemblies with different combination of settings. Thanks, Dario