B-UMMI / INNUca

INNUENDO quality control of reads, de novo assembly and contigs quality assessment, and possible contamination search
GNU General Public License v3.0
13 stars 7 forks source link

field values combined_report.py script #35

Open bala-ruokavirasto opened 2 years ago

bala-ruokavirasto commented 2 years ago

Dear INNUCA team,

Is there any documentation somewhere about the below field values mentioned in the combined_report.py scripy?

fields = ['#samples', 'number_reads_sequenced', 'number_bp_sequenced', 'min_reads_length', 'max_reads_length', 'reads_kraken_number_taxon_found', 'reads_kraken_percentage_unknown_fragments', 'reads_kraken_most_abundant_taxon', 'reads_kraken_percentage_most_abundant_taxon', 'first_coverage', 'trueCoverage_absent_genes', 'trueCoverage_multiple_alleles', 'trueCoverage_sample_coverage', 'second_Coverage', 'pear_assembled_reads', 'pear_unassembled_reads', 'pear_dicarded_reads', 'SPAdes_number_contigs', 'SPAdes_number_bp', 'SPAdes_filtered_contigs', 'SPAdes_filtered_bp', 'assembly_coverage_initial', 'assembly_coverage_filtered', 'mapped_reads_percentage', 'mapping_filtered_contigs', 'mapping_filtered_bp', 'Pilon_changes', 'Pilon_contigs_changed', 'Pilon_contigs', 'Pilon_bp', 'MLST_scheme', 'MLST_ST', 'assembly_kraken_number_taxon_found', 'assembly_kraken_percentage_unknown_fragments', 'assembly_kraken_most_abundant_taxon', 'assembly_kraken_percentage_most_abundant_taxon', 'insert_size_mean', 'insert_size_sd', 'final_assembly']

I would like to know some minimum information about these field values. Although most of the values were straight-forward, i like to know for sure that it means the same thing if you have some documentation for these values.

Thanks in advance,

Best Regards, Bala

ramirma commented 2 years ago

Dear Bala,

I am afraid we never got around to create a proper documentation with a detailed description of each of those items. The ones that seem to me less straightforward are: 'first_coverage' total number of bp in output divided by the provided genome size 'trueCoverage_absent_genes' if it is one of the species for which chewBBACA has a set of reference genes (expected to be present in all isolates) it is the number of missing genes in that set. 'trueCoverage_multiple_alleles 'if it is one of the species for which chewBBACA has a set of reference genes (expected to be present in all isolates) it is the number of possible alleles present in those genes in that set (this may suggest intra-species contamination, i.e. multiple strains of the same species in the sample)

Do let us know if there is anything else we can help you with.

Best Regards,

Mario