Open sprmnt21 opened 1 year ago
Consider the following problem. Given a list of strings, find the groups of consecutive strings led by a string starting with "AT".
julia> itr=[randstring("ACTG") for _ in 1:20]
20-element Vector{String}:
"ATTCCGAG"
"CCCGTGGT"
"TCAAGGGT"
"ATTAGATC"
"TCTTACAC"
"TTTCCGCC"
"TCCGACCG"
"GTCAGCTA"
"CATGTTGC"
"GAGGAACG"
"GTCAATGC"
"TACTCATT"
"ATACTCTA"
"AATTCACA"
"AATCATAT"
"GTATACCT"
"ATTTTACT"
"TTCAGAAG"
"GTTGATGA"
"GACGGCGG"
julia> steps=diff([findall(startswith("AT"), itr);length(itr)])
4-element Vector{Int64}:
3
9
4
3
julia> collect(partby(itr,steps))
4-element Vector{Tuple{Vararg{String}}}:
("ATTCCGAG", "CCCGTGGT", "TCAAGGGT")
("ATTAGATC", "TCTTACAC", "TTTCCGCC", "TCCGACCG", "GTCAGCTA", "CATGTTGC", "GAGGAACG", "GTCAATGC", "TACTCATT")
("ATACTCTA", "AATTCACA", "AATCATAT", "GTATACCT")
("ATTTTACT", "TTCAGAAG", "GTTGATGA")
or better
st=findall(startswith(somesubstring), itr)
steps=st[1]!=1 ? diff([1;st;length(itr)+1]) : diff([st;length(itr)+1])
collect(partby(itr,steps))
Could it be convenient to have an iterator that is somewhere between groupby and partition? The application of the function refers to the case in which we want to take some consecutive slices of variable dimensions (steps) from an iterator