chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
505 stars 84 forks source link

multiple input files and format #61

Closed dcopetti closed 3 years ago

dcopetti commented 3 years ago

Hello, I am about to start an assembly of a large genome and I have my input data (fasta/fastq straight off of the instrument) in 13 files of 30-50 GB each (fastq). I wonder if it is possible to specify more than an input file in the hifiasm command, or if I can supply a list (.fofn) of the inputs - this would save time in preparing the input and in moving it around as well. Does the format (fasta/fastq) make any difference? Maybe in the future, would it be possible to feed directly a bam file - to save time in converting between formats? Do I need to do any pre-processing of such data? In which case would I need to use the -z option of hifiasm?

Lastly, a question regarding settings. I have an inbred diploid plant genome of about 10 Gb, with about 24x coverage of CCS data. I see that for maize you use -l0 and for strawberry (because of polyploidy?) -D10. My plant is allohexaploid, should I also increase -D? Should I also avoid purging "haplotigs" since these could be homoeologous sequences? Any other option I should consider? I guess I will run a few assemblies with different combination of settings. Thanks, Dario

lh3 commented 3 years ago

Regarding to the first question, see the hifiasm command-line help:

Usage: hifiasm [options] <in_1.fq> <in_2.fq> <...>

Just put multiple files on the command line. For typical HiFi data, no need to use -z. Like seqtk, minimap2, bwa, ... hifiasm seamlessly works with fasta, fastq and their gzip'd versions. I don't see hifiasm support BAM as that requires to bring a heavy dependency and would make hifiasm harder to install. File conversion is much faster than assembly anyway.

As to the setting, -D10 sometimes helps. You can try both the default and -D10. As is explained in README, -l0 is preferred for inbred samples. Note that most of time haplotig purging shouldn't purge homologous regions. However, it may introduce minor errors in corner cases.

Ural-Yunusbaev commented 2 years ago

Hi, I have in_1.fq in_2.fq in3.fq Can I use hifiasm [options] in*.fq ?

chhylp123 commented 2 years ago

Yes, you can.

AlcaArctica commented 8 months ago

Does it make a difference whether I merge several .fastq.gz input files first with cat and feed one large input file to hifiasm, or whether I hand it several small fastq.gz files as described here? Does it influence speed/performance?

chhylp123 commented 8 months ago

It won't. The same inputs will have exactly the same output.