purge_dups can be tuned with setting various parameters and i did find a big difference in purging. For example with the default parameters I get 2.2G of purged sequences while setting some other parameters, I got 2.6Gs.
There is a big reduction in BUSCO with the 2.2G version, so I am considering finding a better parameter set.
But these parameters don't seem to be very well documented. Would it be possible to give a more detailed explanation of what each parameter mean? In particular, what are the ranges of some parameters (minimum alignment score, for example).
-f INT minimum fraction of haploid/diploid/bad/repetitive bases in a sequence [.8]
-a INT minimum alignment score [70]
-b INT minimum max match score [200]
-2 BOOL 2 rounds chaining [FALSE]
-m INT minimum matching bases for chaining [500]
-M INT maximum gap size for chaining [20K]
-G INT maximum gap size for 2nd round chaining [50K]
purge_dups can be tuned with setting various parameters and i did find a big difference in purging. For example with the default parameters I get 2.2G of purged sequences while setting some other parameters, I got 2.6Gs. There is a big reduction in BUSCO with the 2.2G version, so I am considering finding a better parameter set. But these parameters don't seem to be very well documented. Would it be possible to give a more detailed explanation of what each parameter mean? In particular, what are the ranges of some parameters (minimum alignment score, for example).
-f INT minimum fraction of haploid/diploid/bad/repetitive bases in a sequence [.8]
-a INT minimum alignment score [70]
-b INT minimum max match score [200]
-2 BOOL 2 rounds chaining [FALSE]
-m INT minimum matching bases for chaining [500]
-M INT maximum gap size for chaining [20K]
-G INT maximum gap size for 2nd round chaining [50K]
-l INT minimum chaining score for a match [10K]
-E INT maximum extension for contig ends [15K]