chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
547 stars 87 forks source link

Linux Aarch64 support #641

Open martin-g opened 6 months ago

martin-g commented 6 months ago

Fixes https://github.com/chhylp123/hifiasm/issues/288

Uses https://github.com/DLTcollab/sse2neon to translate SSE instructions to NEON ones.

martin-g commented 6 months ago

mgrigorov in 🌐 euler-arm-22 in hifiasm on  aarch64-support [?] via C v10.3.1-gcc 
❯ file hifiasm 
hifiasm: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=35b1f9d739f865d1953b27a4bcd53a200ac663b3, for GNU/Linux 3.7.0, with debug_info, not stripped

mgrigorov in 🌐 euler-arm-22 in hifiasm on  aarch64-support [?] via C v10.3.1-gcc 
❯ ./hifiasm 
Usage: hifiasm [options] <in_1.fq> <in_2.fq> <...>
Options:
  Input/Output:
    -o STR       prefix of output files [hifiasm.asm]
    -t INT       number of threads [1]
    -h           show help information
    --version    show version number
  Overlap/Error correction:
    -k INT       k-mer length (must be <64) [51]
    -w INT       minimizer window size [51]
    -f INT       number of bits for bloom filter; 0 to disable [37]
    -D FLOAT     drop k-mers occurring >FLOAT*coverage times [5.0]
    -N INT       consider up to max(-D*coverage,-N) overlaps for each oriented read [100]
    -r INT       round of correction [3]
    -z INT       length of adapters that should be removed [0]
    --max-kocc   INT
                 employ k-mers occurring <INT times to rescue repetitive overlaps [2000]
    --hg-size    INT(k, m or g)
                 estimated haploid genome size used for inferring read coverage [auto]
  Assembly:
    -a INT       round of assembly cleaning [4]
    -m INT       pop bubbles of <INT in size in contig graphs [10000000]
    -p INT       pop bubbles of <INT in size in unitig graphs [0]
    -n INT       remove tip unitigs composed of <=INT reads [3]
    -x FLOAT     max overlap drop ratio [0.8]
    -y FLOAT     min overlap drop ratio [0.2]
    -i           ignore saved read correction and overlaps
    -u           post-join step for contigs which may improve N50; 0 to disable; 1 to enable
                 [1] and [1] in default for the UL+HiFi assembly and the HiFi assembly, respectively
    --hom-cov    INT
                 homozygous read coverage [auto]
    --lowQ       INT
                 output contig regions with >=INT% inconsistency in BED format; 0 to disable [70]
    --b-cov      INT
                 break contigs at positions with <INT-fold coverage; work with '--m-rate'; 0 to disable [0]
    --h-cov      INT
                 break contigs at positions with >INT-fold coverage; work with '--m-rate'; -1 to disable [-1]
    --m-rate     FLOAT
                 break contigs at positions with <=FLOAT*coverage exact overlaps;
                 only work with '--b-cov' or '--h-cov'[0.75]
    --primary    output a primary assembly and an alternate assembly
  Trio-partition:
    -1 FILE      hap1/paternal k-mer dump generated by "yak count" []
    -2 FILE      hap2/maternal k-mer dump generated by "yak count" []
    -3 FILE      list of hap1/paternal read names []
    -4 FILE      list of hap2/maternal read names []
    -c INT       lower bound of the binned k-mer's frequency [2]
    -d INT       upper bound of the binned k-mer's frequency [5]
    --t-occ      INT
                 forcedly remove unitigs with >INT unexpected haplotype-specific reads;
                 ignore graph topology; [60]
    --trio-dual  utilize homology information to correct trio phasing errors
  Purge-dups:
    -l INT       purge level. 0: no purging; 1: light; 2/3: aggressive [0 for trio; 3 for unzip]
    -s FLOAT     similarity threshold for duplicate haplotigs in read-level [0.75 for -l1/-l2, 0.55 for -l3]
    -O INT       min number of overlapped reads for duplicate haplotigs [1]
    --purge-max  INT
                 coverage upper bound of Purge-dups [auto]
    --n-hap      INT
                 number of haplotypes [2]
  Hi-C-partition:
    --h1 FILEs   file names of Hi-C R1  [r1_1.fq,r1_2.fq,...]
    --h2 FILEs   file names of Hi-C R2  [r2_1.fq,r2_2.fq,...]
    --seed INT   RNG seed [11]
    --s-base     FLOAT
                 similarity threshold for homology detection in base-level;
                 -1 to disable [0.5]; -s for read-level (see <Purge-dups>)
    --n-weight   INT
                 rounds of reweighting Hi-C links [3]
    --n-perturb  INT
                 rounds of perturbation [10000]
    --f-perturb  FLOAT
                 fraction to flip for perturbation [0.1]
    --l-msjoin   INT
                 detect misjoined unitigs of >=INT in size; 0 to disable [500000]
  Ultra-Long-integration:
    --ul FILEs   file names of Ultra-Long reads [r1.fq,r2.fq,...]
    --ul-rate    FLOAT
                 error rate of Ultra-Long reads [0.2]
    --ul-tip     INT
                 remove tip unitigs composed of <=INT reads for the UL assembly [6]
    --path-max   FLOAT
                 max path drop ratio [0.6]; higher number may make the assembly cleaner
                 but may lead to more misassemblies
    --path-min   FLOAT
                 min path drop ratio [0.2]; higher number may make the assembly cleaner
                 but may lead to more misassemblies
    --ul-cut     INT
                 filter out <INT UL reads during the UL assembly [0]
  Dual-Scaffolding:
    --dual-scaf  output scaffolding
    --scaf-gap   INT
                 max gap size for scaffolding [3000000]
Example: ./hifiasm -o NA12878.asm -t 32 NA12878.fq.gz
See `https://hifiasm.readthedocs.io/en/latest/' or `man ./hifiasm.1' for complete documentation.
martin-g commented 6 months ago

I work on adding Github Actions based CI for Linux ARM64 too!

martin-g commented 6 months ago

Successful CI run at my fork: https://github.com/martin-g/hifiasm/actions/runs/8859185274

jianshu93 commented 6 months ago

Many Thanks! I have successfully compiled it on my M1 MacBook Pro.

Jianshu

davidecarlson commented 3 months ago

I can also confirm this branch builds on a Linux Nvidia grace CPU system using make ARCH_FLAGS="-mcpu=neoverse-v2". Any chance this will be merged in the near future?