camilogarciabotero / GeneFinder.jl

A Gene Finder framework for Julia.
https://camilogarciabotero.github.io/GeneFinder.jl/dev
MIT License
15 stars 1 forks source link

Using score to filter what `getorfs` delivers #33

Closed camilogarciabotero closed 4 months ago

camilogarciabotero commented 5 months ago

After https://github.com/camilogarciabotero/GeneFinder.jl/pull/26 and https://github.com/camilogarciabotero/GeneFinder.jl/pull/32 we can now have a more flexible way to use the findorfs with multiple ORF finder methods with or without scoring scheme. Now, we can levearege on that to make getorfs more complex by adding a scoring filter to get only the sequences that actually are above a scoring threshold. For instance the argmax to the orf.score field will help.

orfs[argmax([orf.score for orf in orfs])]

We can also use a combination of sorting and filtering:

sortedorfs = sort(orfs, by = orf -> -orf.score)
sortedorfs[1:min(10, end)]

The function will gain a min_score kwarg:

function getorfs(
    sequence::NucleicSeqOrView{DNAAlphabet{N}},
    ::DNAAlphabet{N},
    method::M;
    kwargs...
    min_score=0
) where {N,M<:GeneFinderMethod}
 ...
end

Still to define...

camilogarciabotero commented 5 months ago

Other experiments:

sort!(orfs; by = orf -> getproperty(orf, :score), rev=true, alg=QuickSort, kwargs...)
sort!(orfs; by = orf -> getproperty(orf, :score), rev=true, alg=PartialQuickSort(k), kwargs...) # not stable
camilogarciabotero commented 5 months ago

Maybe using the iscoding method:

function findorfs(..., encoding::Bool = false,...)
return encoding ? [i for i in orfs if iscoding(sequence[i]; kwargs...)] : orfs