Open isaacovercast opened 4 years ago
Hi, I am using ipyrad ver 0.9.84 and trim_loci does not work. I tried to trim six bp from 3' ends of loci (format 0, 6, 0, 0 ) and cut nothing. I also tried (0, -6, 0, 0 ), but the result was the same. I only tried this at step 7, which is supposed to act. Any way to solve this?
Ooooh, I see what's happened. Finding the internal edges for paired data is still not implemented, so it's defaulting to not trimming those. The clue is the cryptic message and the ellipses and the fact that edges[1] and edges[2] are set to 0 here:
def trim_for_coverage(self, minsite_left=4, minsite_right=4):
"trim edges to where data is not N or -"
# what is the limit of site coverage for trimming?
minsamp_left = min(minsite_left, self.seqs.shape[0])
minsamp_right = min(minsite_right, self.seqs.shape[0])
# how much cov is there at each site?
mincovs = np.sum((self.seqs != 78) & (self.seqs != 45), axis=0)
# locus left trim
self.edges[0] = locus_left_trim(self.seqs, minsamp_left, mincovs)
self.edges[3] = locus_right_trim(self.seqs, minsamp_right, mincovs)
if self.edges[3] <= self.edges[0]:
self.bad = True
# find insert region for paired data to mask it...
self.edges[1] = 0
self.edges[2] = 0
This is going to be a moderate amount of work, so it might not happen quickly, sorry to say.
The trimming of the external edges happens on L750-753:
# trim edges, need to use uppered seqs for maxvar & maxshared
edg = self.edges[self.iloc]
ublock = self.useqs[:, edg[0]:edg[3]]
ablock = self.aseqs[:, edg[0]:edg[3]]
The trimming of internal edges isn't really trimming, but rather masking edges flanking the 'nnnn' insert between mate pairs (on L825):
# mask insert in denovo data
self.aseqs[:, edges.edges[1]:edges.edges[2]] = 110 # n
self.useqs[:, edges.edges[1]:edges.edges[2]] = 78 # N
This isn't doing anything right now because edges[1] and edges[2] are always 0. The interpretation here is that the values in edges
should be the location in bp to trim to, and that is indeed what trim_check()
would do.
Somebody in gitter reported trim_loci wasn't working for the distal end of R1 and i checked and it is true that it's not doing anything. In debugging this I see this code in write_outfiles:
self.trims is initialized as [0, 0, 0, 0], and I don't see anywhere in the code before the above function call where self.trims[1] or [2] would ever be set to anything besides 0.
On a side note, it's a little confusing that trim_reads and trim_loci are parameterized differently for the 3' ends of reads. trim_reads takes a length in bp to trim 3' to, and passing negative values trims that many bases from the end. trim_loci looks like postive values trim from the end and negative values don't appear to be accepted. It's a little confusing, they should probably both work the same way.