Open RenzoTale88 opened 2 years ago
@cschin sorry for insisting with this, any update on the documentation concerning the outputs?
Here are some comments
These two are the initial assembly output:
asm_ctgs_m.fa # main contig
asm_ctgs_e0.fa # extra contig (fragmented smaller results due to erroneous
reads or very complicated repeats)
some of the contig in asm_ctgs_e0.fa may be duplicated, a de-duplication
process is applied to filter out contigs in asm_ctgs_e0.fa to generate asm_ctgs_e.fa
The contigs in asm_ctgs_m.fa go through a process to identify homologous
contigs between two haplotypes in a diploid genome. The "primary contigs"
are kept in asm_ctgs_m_p.fa and the "associated contigs" are kept in asm_ctgs_m_a.fa.
The asm_ctgs_m_rel.dat contains information of the relation between the
contigs inside asm_ctgs_m_a.fa to the contigs inside asm_ctgs_m_p.fa.
Hello, I've just tried to run peregrine on a set of hifi reads for a large mammalian genome. The software finished successfully, but I can't find an explanation of the different outputs generated by the software. In particular, I have the following outputs:
I'm assuming the file
asm_ctgs_m_p.fa
is the primary assembly, whereas theasm_ctgs_m_a.fa
is the assembly carrying the alternative alleles, but I'm unsure about the other files. Thank you in advance, best regards Andrea