Closed hsun3163 closed 2 years ago
It is unclear which step caused this issue, but to avoid it. a remove duplicate step will be added at the end of leafcutter annotation.
It looks like the same genomic region is annotated to two genes? We should figure out why it happened in the first place then prevent it from happening (not removing it). This gene is a good example to check.
The problem with removing is which gene would you remove? You can imagine it could cause serious issues when you randomly remove the "wrong" gene.
The problem with removing is which gene would you remove? You can imagine it could cause serious issues when you randomly remove the "wrong" gene.
They are exactly the same line, so I think it is safe to drop it. I will do the processing via remote control and further investigate the issue myself after the sqtl was produced.
Sorry I'm confused ... these lines are not the same line. It's the same event annotated to different genes and I wonder why that happened in the first place
chr3:125766818:125848253:clu_64950_+:ENSG00000284624 ENSG00000284624
chr3:125766818:125848253:clu_64950_+:ENSG00000284660 ENSG00000284660
Sorry I'm confused ... these lines are not the same line. It's the same event annotated to different genes and I wonder why that happened in the first place
chr3:125766818:125848253:clu_64950_+:ENSG00000284624 ENSG00000284624 chr3:125766818:125848253:clu_64950_+:ENSG00000284660 ENSG00000284660
I just noticed this. Cuz they are produced by
grep "chr3:125766818:125848253:clu_64950_+:ENSG00000284624"
so I didn't noticed guess the:
produce lead to some confusion for the grep
To clarify, what you will remove are exact duplicates (as implemented in #419) and not cases such as the above? And we still don't know why duplicates occur in the first place?
As it turns out, this ticket is a misunderstanding regarding the nature of the issue at hand. Will open up a new one when more light is shed on.
In the output of leafcutter for ROSMAP, ~600 out of ~200000 of the rows are duplicated, exemplified by
However, this behavior is not observed in our MWE and protocol data