Rust-Wellcome / FasMan

A re-write (+ extras) of Python scripts, used in Tree of Life, into a single Rust script.
1 stars 1 forks source link

UPDATE: yaml_validator #15

Open DLBPointon opened 4 months ago

DLBPointon commented 4 months ago

YAML_VALIDATOR needs updating for the current version of the TreeVal Yaml file: V1.1.0 - Ancient Auroura - yaml file

Just adding to the current stuff, validating paths and values.

DLBPointon commented 1 month ago

This has ended up in an almost complete re-write.

May require another re-write depending on discussion with the team.

Do I keep the current implementation and add another couple of functions to bind the output together, scan through and then sys.exit(1) on a List of the fails.

Or...

Change it all and use a mut self style of struct, this will simplify the module significantly.

DLBPointon commented 1 month ago

output currently is printed to stdout:

WELCOME TO Fasta Manipulator
This has been made to help prep data for use in the Treeval and curationpretext pipelines
ONLY THE yamlvalidator IS SPECIFIC TO TREEVAL, THE OTHER COMMANDS CAN BE USED FOR ANY OTHER PURPOSE YOU WANT
RUNNING SUBCOMMAND: |
-- validateyaml
RUNNING ON: |
-- macos
Validating Yaml: test_data/yaml/test.yaml
FASTA VALID: PASS : FASTA CONTAINS - 9 H/S PAIRS
CRAMtags:
    @SO "unsorted"
    @RG ["41460_1#7"]
    @?? [] <-- Other Tags
    @SQ 0 Counted
Confirm EOF (@??): ID WHETHER EOF EXISTS - NOODLES CRAM DOES NOT SUPPORT THE EOF CONTAINER
CRAM       : PASS : ["/Users/dp24/Documents/FastaManipulator/TreeValTinyData/genomic_data/hic-arima/SUBSET-1000.cram", "/Users/dp24/Documents/FastaManipulator/TreeValTinyData/genomic_data/hic-arima/SUBSET-2000.cram.crai", "/Users/dp24/Documents/FastaManipulator/TreeValTinyData/genomic_data/hic-arima/SUBSET-2000.cram", "/Users/dp24/Documents/FastaManipulator/TreeValTinyData/genomic_data/hic-arima/SUBSET-1000.cram.crai"] : cram/crai = 4/4
ALIGNER    : PASS : minimap2
LONGREAD   : PASS (/Users/dp24/Documents/FastaManipulator/TreeValTinyData/genomic_data/pacbio/) FASTA.GZ = 1
BUSCO PATH : PASS : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/busco/subset/lineages/fungi_odb10
GENESET P. : -
    --PASS : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/gene_alignment_data//fungi/LaetiporusSulphureus
    --PASS : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/gene_alignment_data//fungi/csv_data/LaetiporusSulphureus.gfLaeSulp1-data.csv
    --PASS : "LaetiporusSulphureus.gfLaeSulp1-data.csv" : RECORD-COUNT: > : 1 : <
TELOMOT P. : PASS : TTCAGGG
SYNTENICS P: -
    --NO SYNTENICS PROVIDED
KMER PROF P: FAIL : "/Users/dp24/Documents/FastaManipulator/TreeValTinyData/empty//k31/pxPlaOval8.k31.ktab" <-- doesn't exist
DLBPointon commented 1 month ago

3rd option is, like CRAMtags, create a struct to collect the pass values and throw a impl Display on... on it for the pretty printing, function to save to file and then a function to scan through and collect fails. if fails.len() >= 1 { sys.exit(1, fails) }

DLBPointon commented 1 month ago

Now going with a dedicated struct. Has meant i have needed to remove ColoredString :(

This YamlResults struct will have 3 main functions, to_sdout(), to_file() and to_check(). The last one will be for use in a pipeline, will scan through and sys.exit(1, err) on FAIL values.

YamlResults {
    ReferenceResults: "PASS : FASTA CONTAINS - 9 H/S PAIRS",
    CramResults: CRAMtags {
        header_sort_order: [
            "unsorted",
            "unsorted",
        ],
        other_header_fields: [
            "@SO: unsorted",
            "@SO: unsorted",
        ],
        reference_sequence: [
            0,
            0,
        ],
        header_read_groups: [
            "41460_1#7",
            "41460_1#7",
        ],
    },
    AlignerResults: "PASS : minimap2",
    LongreadResults: "PASS (/Users/dp24/Documents/FastaManipulator/TreeValTinyData/genomic_data/pacbio/) FASTA.GZ = 1",
    BuscoResults: "PASS : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/busco/subset/lineages/fungi_odb10",
    TelomereResults: "PASS : TTCAGGG",
    KmerProfileResults: "FAIL : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/empty//k31/pxPlaOval8.k31.ktab",
    GenesetResults: [
        "PASS : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/gene_alignment_data//fungi/LaetiporusSulphureus",
        "PASS : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/gene_alignment_data//fungi/csv_data/LaetiporusSulphureus.gfLaeSulp1-data.csv",
        "PASS : \"LaetiporusSulphureus.gfLaeSulp1-data.csv\"=RECORD-COUNT: >1<",
        "PASS : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/gene_alignment_data//fungi/LaetiporusSulphureus",
        "FAIL : /Users/dp24/Documents/FastaManipulator/TreeValTinyData/gene_alignment_data//fungi/csv_data/Iam.Fail-data.csv",
        "FAIL: No such file or directory (os error 2)",
    ],
    SyntenicResults: [
        "NO SYNTENICS PROVIDED",
    ],
}