Explore reference-free assembly evaluation approach

The current implementation of select_assembly.nf which runs bin/select_assembly.py does the following:

Identifies the dominant species: reads a Kraken2 report (kraken2.out.kraken2_screen) to determine the most dominant species in a sample.
Retrieves genome size: looks up the genome size and chromosome number for the identified species from an NCBI lookup file (get_ncbi.out.ncbi_lookup).
Selects chromosomal contigs: identifies chromosomal contigs based on genome size from the Flye assembly (not unicycler).
Compares contigs: checks if these contigs match those in a reconciled clusters directory and decides which set to use.
Organise output: organises the contigs into final and discarded folders, separates chromosomal and non-chromosomal contigs, and filters contig information.
Produces final output: creates a flag file indicating whether the consensus or Flye-only contigs were used.

This method is biased toward published reference assemblies available on NCBI and after user feedback, we feel we should explore whether we need to so heavily clean the assemblies to only select contigs matching what is in the reference.

Sydney-Informatics-Hub / ONT-bacpac-nf

Explore reference-free assembly evaluation approach #54