Psy-Fer / SquiggleKit

SquiggleKit: A toolkit for manipulating nanopore signal data
MIT License
120 stars 23 forks source link

Using RNA for MotifSeq #34

Open epi-gene opened 4 years ago

epi-gene commented 4 years ago

Would there be a possibility to simulate RNA squiggles and perform motif search ?

Psy-Fer commented 4 years ago

Hello,

yes, just use the following flag --scrappie_model squiggle_r94_rna

epi-gene commented 4 years ago

Hi. I tried the above but the motifseq is yielding no results. The hit probabilities are 0 for all the reads.

epi-gene commented 4 years ago

Is there a way to calculate them manually ?

Psy-Fer commented 4 years ago

Could you please give me an example of the output? I'll look into it. I have some RNA data on hand to double check.

epi-gene commented 4 years ago

fast5 | readID | model | start | end | length | distance_score | model_mean | model_stdev | Z-score | p-value | hit_Probability 8e91fa7d-fd33-422b-8dfb-69915b17aa8d.fast5 | 8e91fa7d-fd33-422b-8dfb-69915b17aa8d | last50 | 11839 | 12071 | 232 | 399.438241083274 | 135.4 | 11.465672 | 23.0285883883015 | 1 | 0

epi-gene commented 4 years ago

You did mention

The p-values and hit probabilities provided are based on loose modelling of negative background scores for a number of k-mers. It is currently only modelled on R9.4 model, not R10 or RNA in the readme.

epi-gene commented 4 years ago

@Psy-Fer Would MotifSeq be able to detect multiple Motif hits within a single read ? Or would it display only the high scoring hit ?

Psy-Fer commented 4 years ago

Currently, only the highest scoring hit.

I plan on allowing it to record more than the the top hit.

A round about way of "hacking" a solution, is to just mask out the region of the best hit, and try again. So, just take the values between the start and stop site, and change them to the mean current, or make them 99999 and they will get filtered out altogether. Obviously that isn't idea, and i'll try fixing this.

epi-gene commented 4 years ago

Ok. Will try that.