ORFs present in the GFF below 3bp in length cause seqkit translate to error, I think we can avoid this by simply filtering the file to exclude very small CDS seqs i.e. adding a seqkit seq step with min cds size of 48
seqkit seq would be added to minos_run.run_config.yaml so -m can be changed but setting as 48 seems reasonable.
I'm not aware of this having an issue downstream i.e. there will be models with CDS but with no protein seq but this is all pre-pick and running diamond with a protein seq of 1 amino acid wont be useful anyway
https://github.com/EI-CoreBioinformatics/minos/blob/0611d65fdd2c7e3a01b939282dd597dfb6a447f6/minos/zzz/minos_run.smk#L302
ORFs present in the GFF below 3bp in length cause seqkit translate to error, I think we can avoid this by simply filtering the file to exclude very small CDS seqs i.e. adding a seqkit seq step with min cds size of 48
seqkit seq -m 48 CDS.fa | seqkit translate --threads 1 --line-width 70 -T 1
seqkit seq would be added to minos_run.run_config.yaml so -m can be changed but setting as 48 seems reasonable.
I'm not aware of this having an issue downstream i.e. there will be models with CDS but with no protein seq but this is all pre-pick and running diamond with a protein seq of 1 amino acid wont be useful anyway