Closed cathoderaymission closed 11 months ago
Hi @cathoderaymission, Yes, those Kmers containing 'N' are discarded. Oh, you are right. There is no check for the "NNNNN" Kmers in the transcriptome mode. We will add the 'N' checker in the preprocess_tx soon - Thank you very much.
However, this does not affect the process of xpore diffmod; except that the result table may contain those 'N' kmers, which can be filtered out later.
Dear Developer, Could you please tell me the kit used for the cell line? Thanks very much!
Looking at the code for xpore 2.0, I can see the following:
assert list(set(g_kmer_array))[0].count('N') == 0 ##to weed out the mapped kmers from tx_seq that contain 'N', which is not in diffmod's model_kmer
Does this mean any nanopore read which includes an unmapped (not in model_kmer) events for that chromosome/position are discarded? Or do we just discard the mean from the NNNNN event and use the rest?
I ask this because from your demo data I can see that sometimes one of these events happens in the middle of a mapping, and according to the paper the multiple event means are weighted/averaged by their event length.
I also noticed that when not using genome mapping, there doesn't seem to be any check for these 'NNNNN' events in the preprocess_tx function.