Closed joshuak94 closed 5 years ago
@dzerbino Before you accept this pull request, I think there's an issue with shifting the positions past the length of the chromosome... I'm not sure how I should check for this quite yet.
The trick is to only allow shifts towards smaller coordinates. Doing so you can easily suppress negative coords. Could you please update the code correspondingly?
I'm not sure what you mean by "towards smaller coordinates". The issue is not only negative coordinates, as theoretically if a read is at the very end of a chromosome, shifting it + 200 could give it a coordinate value which is higher than the chromosome length.
Right, but could you constrain the system such that you are only allowed to shift negatively?
On Wed, 8 May 2019, 04:50 Joshua Kim, notifications@github.com wrote:
I'm not sure what you mean by "towards smaller coordinates". The issue is not only negative coordinates, as theoretically if a read is at the very end of a chromosome, shifting it + 200 could give it a coordinate value which is higher than the chromosome length.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Ensembl/WiggleTools/pull/34#issuecomment-490404813, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAIDV7AKA7FCUSKW2SKHH3PUKH57ANCNFSM4HK66VCQ .
So only allow something like wiggletools shiftPos -30 x.wig
? Then there could be two potential issues:
For issue 1), would the solution be to change the values from (-30, 6) to (0, 6)? Or to just prevent the shift entirely, so the read stays at (0, 36).
For issue 2), any solution requires wiggletools to know the chromosome lengths already...
I'm in the process of parsing through the MaSC.pl code to understand how they deal with these issues.
Regarding 1) I would argue that the shifted block should be clipped at 0, hence should become (0, 6)
Regarding point 2) I do not see why blocks at the beginning of other chromosomes would be shifted onto another chromosome. The genome is not a continuous sequence. They would get dropped or clipped as if they were on chrom 1.
Yeah, point 2 was I think me misunderstanding how the bam files are written.
So, quick question about something:
If the start, end coordinates are (10, 30), and I want to shift by -15, expected behavior is now (0, 15).
If I want to instead shift by -50, this entire line should just be removed. I don't want a line that says (0, 0). Here's what I have so far:
void ShiftPosIteratorPop(WiggleIterator * wi) {
ScaleWiggleIteratorData * data = (ScaleWiggleIteratorData *) wi->data;
WiggleIterator * iter = data->iter;
if (!iter->done) {
if (data->scalar < iter->finish) {
wi->chrom = iter->chrom;
wi->start = (data->scalar >= iter->start ? 0 : iter->start - data->scalar);
wi->finish = iter->finish - data->scalar;
wi->value = iter->value;
}
pop(iter);
} else {
wi->done = true;
}
}
Basically, I'm expecting that if the end coordinate is larger than the shift value, then it assigns the start coordinate to either 0 or iter->start - data->scalar
and the end to just iter->finish - data->scalar
, and the rest is kept. If, however, the end coordinate is <= the shift value, I skip all of it and just pop, which I was hoping would just skip the line. However, it doesn't, and I get a result like:
fixedStep chrom= start=0 step=1
fixedStep chrom=chr1 start=0 step=1
4.000000
5.000000
6.000000
7.000000
8.000000
9.000000
Any tips? I guess I'm not quite understanding pop(iter)
and what it does.
Pop(iter) goes to the next block in the upstream iterator, which you would normally do once after reading the current block.
What you want to do is pop it as many times as necessary to get a valid block, process that one, then pop one last time to update iter.
In short it should look like:
while(coords not good) pop(iter)
update wi coords and value pop iter one last time
Ah okay, thanks!
Also, you said we should only allow shifting toward smaller coordinates. Is there a reason we don't want to allow shifting in the other direction?
While it's OK to shift coordinates arbitrarily in Wiggletools, the Kent tools complain if you go beyond the chromosome length. Since WT does not know what the chrom lengths are it's safer to go downwards only.
Makes sense!
Also addresses some small misc. corrections.